torchvision.transforms¶
Transforms are common image transforms. They can be chained together using Compose
-
class
torchvision.transforms.
Compose
(transforms)[source]¶ Composes several transforms together.
Parameters: transforms (list of Transform
objects) – list of transforms to compose.Example
>>> transforms.Compose([ >>> transforms.CenterCrop(10), >>> transforms.ToTensor(), >>> ])
Transforms on PIL Image¶
-
class
torchvision.transforms.
Resize
(size, interpolation=2)[source]¶ Resize the input PIL Image to the given size.
Parameters: - size (sequence or int) – Desired output size. If size is a sequence like (h, w), output size will be matched to this. If size is an int, smaller edge of the image will be matched to this number. i.e, if height > width, then image will be rescaled to (size * height / width, size)
- interpolation (int, optional) – Desired interpolation. Default is
PIL.Image.BILINEAR
-
class
torchvision.transforms.
Scale
(*args, **kwargs)[source]¶ Note: This transform is deprecated in favor of Resize.
-
class
torchvision.transforms.
CenterCrop
(size)[source]¶ Crops the given PIL Image at the center.
Parameters: size (sequence or int) – Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop (size, size) is made.
-
class
torchvision.transforms.
RandomCrop
(size, padding=0)[source]¶ Crop the given PIL Image at a random location.
Parameters: - size (sequence or int) – Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop (size, size) is made.
- padding (int or sequence, optional) – Optional padding on each border of the image. Default is 0, i.e no padding. If a sequence of length 4 is provided, it is used to pad left, top, right, bottom borders respectively.
-
class
torchvision.transforms.
RandomHorizontalFlip
[source]¶ Horizontally flip the given PIL Image randomly with a probability of 0.5.
-
class
torchvision.transforms.
RandomVerticalFlip
[source]¶ Vertically flip the given PIL Image randomly with a probability of 0.5.
-
class
torchvision.transforms.
RandomResizedCrop
(size, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=2)[source]¶ Crop the given PIL Image to random size and aspect ratio.
A crop of random size (default: of 0.08 to 1.0) of the original size and a random aspect ratio (default: of 3/4 to 4/3) of the original aspect ratio is made. This crop is finally resized to given size. This is popularly used to train the Inception networks.
Parameters: - size – expected output size of each edge
- scale – range of size of the origin size cropped
- ratio – range of aspect ratio of the origin aspect ratio cropped
- interpolation – Default: PIL.Image.BILINEAR
-
class
torchvision.transforms.
RandomSizedCrop
(*args, **kwargs)[source]¶ Note: This transform is deprecated in favor of RandomResizedCrop.
-
class
torchvision.transforms.
Grayscale
(num_output_channels=1)[source]¶ Convert image to grayscale.
Parameters: num_output_channels (int) – (1 or 3) number of channels desired for output image Returns: Grayscale version of the input. - If num_output_channels == 1 : returned image is single channel - If num_output_channels == 3 : returned image is 3 channel with r == g == b Return type: PIL Image
-
class
torchvision.transforms.
RandomGrayscale
(p=0.1)[source]¶ Randomly convert image to grayscale with a probability of p (default 0.1).
Parameters: p (float) – probability that image should be converted to grayscale. Returns: Grayscale version of the input image with probability p and unchanged with probability (1-p). - If input image is 1 channel: grayscale version is 1 channel - If input image is 3 channel: grayscale version is 3 channel with r == g == b Return type: PIL Image
-
class
torchvision.transforms.
FiveCrop
(size)[source]¶ Crop the given PIL Image into four corners and the central crop
Note
This transform returns a tuple of images and there may be a mismatch in the number of inputs and targets your Dataset returns. See below for an example of how to deal with this.
Parameters: size (sequence or int) – Desired output size of the crop. If size is an int
instead of sequence like (h, w), a square crop of size (size, size) is made.Example
>>> transform = Compose([ >>> FiveCrop(size), # this is a list of PIL Images >>> Lambda(lambda crops: torch.stack([ToTensor()(crop) for crop in crops])) # returns a 4D tensor >>> ]) >>> #In your test loop you can do the following: >>> input, target = batch # input is a 5d tensor, target is 2d >>> bs, ncrops, c, h, w = input.size() >>> result = model(input.view(-1, c, h, w)) # fuse batch size and ncrops >>> result_avg = result.view(bs, ncrops, -1).mean(1) # avg over crops
-
class
torchvision.transforms.
TenCrop
(size, vertical_flip=False)[source]¶ Crop the given PIL Image into four corners and the central crop plus the flipped version of these (horizontal flipping is used by default)
Note
This transform returns a tuple of images and there may be a mismatch in the number of inputs and targets your Dataset returns. See below for an example of how to deal with this.
Parameters: Example
>>> transform = Compose([ >>> TenCrop(size), # this is a list of PIL Images >>> Lambda(lambda crops: torch.stack([ToTensor()(crop) for crop in crops])) # returns a 4D tensor >>> ]) >>> #In your test loop you can do the following: >>> input, target = batch # input is a 5d tensor, target is 2d >>> bs, ncrops, c, h, w = input.size() >>> result = model(input.view(-1, c, h, w)) # fuse batch size and ncrops >>> result_avg = result.view(bs, ncrops, -1).mean(1) # avg over crops
-
class
torchvision.transforms.
Pad
(padding, fill=0)[source]¶ Pad the given PIL Image on all sides with the given “pad” value.
Parameters: - padding (int or tuple) – Padding on each border. If a single int is provided this is used to pad all borders. If tuple of length 2 is provided this is the padding on left/right and top/bottom respectively. If a tuple of length 4 is provided this is the padding for the left, top, right and bottom borders respectively.
- fill – Pixel fill value. Default is 0. If a tuple of length 3, it is used to fill R, G, B channels respectively.
-
class
torchvision.transforms.
ColorJitter
(brightness=0, contrast=0, saturation=0, hue=0)[source]¶ Randomly change the brightness, contrast and saturation of an image.
Parameters: - brightness (float) – How much to jitter brightness. brightness_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness].
- contrast (float) – How much to jitter contrast. contrast_factor is chosen uniformly from [max(0, 1 - contrast), 1 + contrast].
- saturation (float) – How much to jitter saturation. saturation_factor is chosen uniformly from [max(0, 1 - saturation), 1 + saturation].
- hue (float) – How much to jitter hue. hue_factor is chosen uniformly from [-hue, hue]. Should be >=0 and <= 0.5.
Transforms on torch.*Tensor¶
-
class
torchvision.transforms.
Normalize
(mean, std)[source]¶ Normalize an tensor image with mean and standard deviation. Given mean:
(M1,...,Mn)
and std:(S1,..,Sn)
forn
channels, this transform will normalize each channel of the inputtorch.*Tensor
i.e.input[channel] = (input[channel] - mean[channel]) / std[channel]
Parameters: - mean (sequence) – Sequence of means for each channel.
- std (sequence) – Sequence of standard deviations for each channel.
Conversion Transforms¶
-
class
torchvision.transforms.
ToTensor
[source]¶ Convert a
PIL Image
ornumpy.ndarray
to tensor.Converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0].
-
__call__
(pic)[source]¶ Parameters: pic (PIL Image or numpy.ndarray) – Image to be converted to tensor. Returns: Converted image. Return type: Tensor
-
-
class
torchvision.transforms.
ToPILImage
(mode=None)[source]¶ Convert a tensor or an ndarray to PIL Image.
Converts a torch.*Tensor of shape C x H x W or a numpy ndarray of shape H x W x C to a PIL Image while preserving the value range.
Parameters: mode (PIL.Image mode) – color space and pixel depth of input data (optional). If mode
isNone
(default) there are some assumptions made about the input data: 1. If the input has 3 channels, themode
is assumed to beRGB
. 2. If the input has 4 channels, themode
is assumed to beRGBA
. 3. If the input has 1 channel, themode
is determined by the data type (i,e,int
,float
,short
).-
__call__
(pic)[source]¶ Parameters: pic (Tensor or numpy.ndarray) – Image to be converted to PIL Image. Returns: Image converted to PIL Image. Return type: PIL Image
-