Transforms#

Created On: Feb 09, 2021 | Last Updated: May 07, 2026 | Last Verified: Not Verified

Data does not always come in its final processed form that is required for training machine learning algorithms. We use transforms to perform some manipulation of the data and make it suitable for training.

All TorchVision datasets have two parameters -transform to modify the features and target_transform to modify the labels - that accept callables containing the transformation logic. The torchvision.transforms module offers several commonly-used transforms out of the box.

The FashionMNIST features are in PIL Image format, and the labels are integers. For training, we need the features as normalized tensors, and the labels as one-hot encoded tensors. To make these transformations, we use the torchvision.transforms.v2 API along with torch.nn.functional.one_hot.

import torch
import torch.nn.functional as F
from torchvision import datasets
from torchvision.transforms import v2

ds = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=v2.Compose([v2.ToImage(), v2.ToDtype(torch.float32, scale=True)]),
    target_transform=v2.Lambda(
        lambda y: F.one_hot(torch.tensor(y), num_classes=10).float()
    ),
)

  0%|          | 0.00/26.4M [00:00<?, ?B/s]
  0%|          | 65.5k/26.4M [00:00<01:10, 376kB/s]
  1%|          | 229k/26.4M [00:00<00:37, 705kB/s]
  3%|▎         | 885k/26.4M [00:00<00:12, 2.09MB/s]
 14%|█▎        | 3.57M/26.4M [00:00<00:03, 7.31MB/s]
 36%|███▌      | 9.40M/26.4M [00:00<00:01, 16.7MB/s]
 58%|█████▊    | 15.2M/26.4M [00:01<00:00, 22.3MB/s]
 79%|███████▉  | 20.8M/26.4M [00:01<00:00, 28.1MB/s]
 92%|█████████▏| 24.3M/26.4M [00:01<00:00, 27.1MB/s]
100%|██████████| 26.4M/26.4M [00:01<00:00, 20.0MB/s]

  0%|          | 0.00/29.5k [00:00<?, ?B/s]
100%|██████████| 29.5k/29.5k [00:00<00:00, 337kB/s]

  0%|          | 0.00/4.42M [00:00<?, ?B/s]
  1%|▏         | 65.5k/4.42M [00:00<00:11, 374kB/s]
  5%|▌         | 229k/4.42M [00:00<00:05, 704kB/s]
 20%|██        | 885k/4.42M [00:00<00:01, 2.09MB/s]
 79%|███████▊  | 3.47M/4.42M [00:00<00:00, 7.11MB/s]
100%|██████████| 4.42M/4.42M [00:00<00:00, 6.30MB/s]

  0%|          | 0.00/5.15k [00:00<?, ?B/s]
100%|██████████| 5.15k/5.15k [00:00<00:00, 57.0MB/s]

ToImage() and ToDtype()#

The torchvision.transforms.v2 API replaces the legacy ToTensor transform with a two-step pipeline. v2.ToImage converts a PIL image or NumPy ndarray into a torchvision.tv_tensors.Image tensor, and v2.ToDtype with scale=True casts it to float32 and scales the pixel intensity values to the range [0., 1.].

Lambda Transforms#

Lambda transforms apply any user-defined lambda function. Here, we use torch.nn.functional.one_hot to turn the integer label into a one-hot encoded tensor of size 10 (the number of labels in our dataset), then cast it to float to match the expected dtype.

target_transform = v2.Lambda(
    lambda y: F.one_hot(torch.tensor(y), num_classes=10).float()
)

Transforms#

ToImage() and ToDtype()#

Lambda Transforms#

Further Reading#

Docs

Tutorials

Resources