OpenVLAImagePreprocessor#
- class torchrl.data.vla.OpenVLAImagePreprocessor(*, size: int = 224, jpeg_quality: int = 95, center_crop: bool = False, backend: Literal['torchvision', 'pil'] = 'torchvision', mean: Tensor | list[float] | tuple[float, ...] | None = None, std: Tensor | list[float] | tuple[float, ...] | None = None)[source]#
OpenVLA-style image resize, JPEG round-trip and optional center crop.
The operation order mirrors the OpenVLA-OFT evaluation path: resize to a square image, JPEG encode/decode at the requested quality, optionally apply a 0.9-area center crop, and resize back. The default
"torchvision"backend keeps data as tensors and usestorchvision.ioJPEG codecs;"pil"is the reference/debugging backend.- Parameters:
size (int) – Square output size. Defaults to
224.jpeg_quality (int) – JPEG quality. Defaults to
95.center_crop (bool) – Whether to apply the OpenVLA 0.9-area center crop. Defaults to
False.backend (str) –
"torchvision"or"pil". Defaults to"torchvision".mean (torch.Tensor | sequence, optional) – Per-channel normalization mean.
std (torch.Tensor | sequence, optional) – Per-channel normalization std.
Note
Floating-point inputs are ambiguous: this helper treats float images with maximum value at most
1as normalized[0, 1]data and rescales them to uint8; other float images are interpreted as[0, 255]data.Examples
>>> import torch >>> from torchrl.data.vla import OpenVLAImagePreprocessor >>> proc = OpenVLAImagePreprocessor(backend="pil") >>> out = proc(torch.zeros(2, 3, 32, 32, dtype=torch.uint8)) >>> out.shape torch.Size([2, 3, 224, 224])