ElasticTransform¶

class torchvision.transforms.v2.ElasticTransform(alpha: Union[float, Sequence[float]] = 50.0, sigma: Union[float, Sequence[float]] = 5.0, interpolation: Union[InterpolationMode, int] = InterpolationMode.BILINEAR, fill: Union[int, float, Sequence[int], Sequence[float], None, dict[Union[type, str], Union[int, float, collections.abc.Sequence[int], collections.abc.Sequence[float], NoneType]]] = 0)[source]¶

Transform the input with elastic transformations.

If the input is a torch.Tensor or a TVTensor (e.g. Image, Video, BoundingBoxes etc.) it can have arbitrary number of leading batch dimensions. For example, the image can have [..., C, H, W] shape. A bounding box can have [..., 4] shape.

Given alpha and sigma, it will generate displacement vectors for all pixels based on random offsets. Alpha controls the strength and sigma controls the smoothness of the displacements. The displacements are added to an identity grid and the resulting grid is used to transform the input.

Note

Implementation to transform bounding boxes is approximative (not exact). We construct an approximation of the inverse grid as inverse_grid = identity - displacement. This is not an exact inverse of the grid used to transform images, i.e. grid = identity + displacement. Our assumption is that displacement * displacement is small and can be ignored. Large displacements would lead to large errors in the approximation.

Applications:: Randomly transforms the morphology of objects in images and produces a see-through-water-like effect.

Parameters:

alpha (float or sequence of python:floats, optional) – Magnitude of displacements. Default is 50.0.
sigma (float or sequence of python:floats, optional) – Smoothness of displacements. Default is 5.0.
interpolation (InterpolationMode, optional) – Desired interpolation enum defined by torchvision.transforms.InterpolationMode. Default is InterpolationMode.BILINEAR. If input is Tensor, only InterpolationMode.NEAREST, InterpolationMode.BILINEAR are supported. The corresponding Pillow integer constants, e.g. PIL.Image.BILINEAR are accepted as well.
fill (number or tuple or dict, optional) – Pixel fill value used when the padding_mode is constant. Default is 0. If a tuple of length 3, it is used to fill R, G, B channels respectively. Fill value can be also a dictionary mapping data type to the fill value, e.g. fill={tv_tensors.Image: 127, tv_tensors.Mask: 0} where Image will be filled with 127 and Mask will be filled with 0.

Examples using ElasticTransform:

Illustration of transforms

make_params(flat_inputs: list[Any]) → dict[str, Any][source]¶

Method to override for custom transforms.

See How to write your own v2 transforms

transform(inpt: Any, params: dict[str, Any]) → Any[source]¶

Method to override for custom transforms.

See How to write your own v2 transforms

ElasticTransform¶

Docs

Tutorials

Resources