Note
Go to the end to download the full example code.
Learn the Basics || Quickstart || Tensors || Datasets & DataLoaders || Transforms || Build Model || Autograd || Optimization || Save & Load Model
Transforms#
Created On: Feb 09, 2021 | Last Updated: May 07, 2026 | Last Verified: Not Verified
Data does not always come in its final processed form that is required for training machine learning algorithms. We use transforms to perform some manipulation of the data and make it suitable for training.
All TorchVision datasets have two parameters -transform to modify the features and
target_transform to modify the labels - that accept callables containing the transformation logic.
The torchvision.transforms module offers
several commonly-used transforms out of the box.
The FashionMNIST features are in PIL Image format, and the labels are integers.
For training, we need the features as normalized tensors, and the labels as one-hot encoded tensors.
To make these transformations, we use the torchvision.transforms.v2 API along with torch.nn.functional.one_hot.
import torch
import torch.nn.functional as F
from torchvision import datasets
from torchvision.transforms import v2
ds = datasets.FashionMNIST(
root="data",
train=True,
download=True,
transform=v2.Compose([v2.ToImage(), v2.ToDtype(torch.float32, scale=True)]),
target_transform=v2.Lambda(
lambda y: F.one_hot(torch.tensor(y), num_classes=10).float()
),
)
0%| | 0.00/26.4M [00:00<?, ?B/s]
0%| | 65.5k/26.4M [00:00<01:10, 376kB/s]
1%| | 229k/26.4M [00:00<00:37, 705kB/s]
3%|▎ | 885k/26.4M [00:00<00:12, 2.09MB/s]
14%|█▎ | 3.57M/26.4M [00:00<00:03, 7.31MB/s]
36%|███▌ | 9.40M/26.4M [00:00<00:01, 16.7MB/s]
58%|█████▊ | 15.2M/26.4M [00:01<00:00, 22.3MB/s]
79%|███████▉ | 20.8M/26.4M [00:01<00:00, 28.1MB/s]
92%|█████████▏| 24.3M/26.4M [00:01<00:00, 27.1MB/s]
100%|██████████| 26.4M/26.4M [00:01<00:00, 20.0MB/s]
0%| | 0.00/29.5k [00:00<?, ?B/s]
100%|██████████| 29.5k/29.5k [00:00<00:00, 337kB/s]
0%| | 0.00/4.42M [00:00<?, ?B/s]
1%|▏ | 65.5k/4.42M [00:00<00:11, 374kB/s]
5%|▌ | 229k/4.42M [00:00<00:05, 704kB/s]
20%|██ | 885k/4.42M [00:00<00:01, 2.09MB/s]
79%|███████▊ | 3.47M/4.42M [00:00<00:00, 7.11MB/s]
100%|██████████| 4.42M/4.42M [00:00<00:00, 6.30MB/s]
0%| | 0.00/5.15k [00:00<?, ?B/s]
100%|██████████| 5.15k/5.15k [00:00<00:00, 57.0MB/s]
ToImage() and ToDtype()#
The torchvision.transforms.v2 API replaces the legacy ToTensor transform with a two-step pipeline.
v2.ToImage
converts a PIL image or NumPy ndarray into a torchvision.tv_tensors.Image tensor, and
v2.ToDtype
with scale=True casts it to float32 and scales the pixel intensity values to the range [0., 1.].
Lambda Transforms#
Lambda transforms apply any user-defined lambda function. Here, we use
torch.nn.functional.one_hot
to turn the integer label into a one-hot encoded tensor of size 10 (the number of labels in our dataset),
then cast it to float to match the expected dtype.
target_transform = v2.Lambda(
lambda y: F.one_hot(torch.tensor(y), num_classes=10).float()
)
Further Reading#
Total running time of the script: (0 minutes 4.264 seconds)