ActionScaling#
- class torchrl.envs.transforms.ActionScaling(in_keys_inv: Sequence[NestedKey] | None = None, out_keys_inv: Sequence[NestedKey] | None = None, in_keys: Sequence[NestedKey] | None = None, out_keys: Sequence[NestedKey] | None = None, *, loc: Tensor | float | None = None, scale: Tensor | float | None = None, standard_normal: bool = True)[source]#
Affine-scale a continuous action using the bounds of the action spec.
Given a bounded action spec with bounds
[low, high], this transform exposes a normalized action space to the policy and rescales actions back to the original env range before they are passed to the environment.The
locandscaleare derived from the spec:\[loc = \frac{high + low}{2}, \quad scale = \frac{high - low}{2}.\]When
standard_normal=True(default) the normalized action space is[-1, 1]and the inverse mapping (policy action -> env action) is\[a_{env} = a_{norm} \cdot scale + loc.\]The forward mapping (env action -> normalized action, used by replay buffer transforms) is the inverse:
\[a_{norm} = (a_{env} - loc) / scale.\]When
standard_normal=Falsethe normalized space is[0, 1]and the mapping is rescaled accordingly so that0maps tolowand1tohigh.- Parameters:
in_keys_inv (sequence of NestedKey, optional) – keys read during the
invdirection (policy -> env). Defaults to["action"]. A single key perActionScalinginstance is supported; compose several instances to scale several actions. Pass an empty list for a forward-only transform (normalize raw dataset actions on the replay-buffer sample path while leavingextendand the env-side action interface untouched); this requires explicitlocandscale.out_keys_inv (sequence of NestedKey, optional) – keys written during the
invdirection. Defaults toin_keys_inv.in_keys (sequence of NestedKey, optional) – keys read during the forward direction (env action -> normalized action, used by replay buffers and inside
Modulechains). Defaults toin_keys_inv, or["action"]whenin_keys_inv=[](forward-only mode).out_keys (sequence of NestedKey, optional) – keys written during the forward direction. Defaults to
in_keys.
- Keyword Arguments:
loc (torch.Tensor or float, optional) – explicit location of the affine transform. If both
locandscaleare provided the values are used as-is and no derivation from the spec is performed (useful when no parent environment is available, e.g. inside a replay buffer). Defaults toNone.scale (torch.Tensor or float, optional) – explicit scale of the affine transform. Must be provided together with
loc. Defaults toNone.standard_normal (bool, optional) – if
True(default), the normalized action space is[-1, 1]. IfFalse, the normalized action space is[0, 1].
- Raises:
RuntimeError – if
locandscaleare derived from the spec (no explicit values passed) and the action spec is unbounded or partially unbounded (any bound is non-finite). With explicitloc/scale, a bounded spec is mapped through the affine transform and an unbounded (or partially unbounded) spec is advertised asUnboundedinstead of raising.
With explicit
locandscalethe transform is fully spec-independent – the standard workflow when training on dataset action statistics, e.g. for VLA policies. Usefrom_stats()(mean/stdorlow/high) orfrom_metadata()to build such an instance from dataset statistics. Attached to an environment, it denormalizes the policy’s actions on the inverse path: a bounded action spec is mapped through the affine transform (and an unbounded action spec stays unbounded), so the advertised normalized space reflects the actual statistics rather than being assumed[-1, 1]. Appended to a replay buffer, it normalizes actions on thesamplepath; beware thatReplayBuffer.extendapplies the inverse transform, so when raw (env-scale) data is written throughextend, use a forward-only instance (in_keys_inv=[]) to leave the stored data untouched – the default bidirectional keys suit the env side and pre-populated dataset storages.Examples
>>> import torch >>> from torchrl.data.tensor_specs import Bounded >>> from torchrl.envs.transforms import ActionScaling, TransformedEnv >>> from torchrl.testing.mocking_classes import ContinuousActionVecMockEnv >>> base_env = ContinuousActionVecMockEnv( ... action_spec=Bounded(low=-2.0, high=4.0, shape=(7,)) ... ) >>> env = TransformedEnv(base_env, ActionScaling()) >>> env.action_spec.space.low tensor([-1., -1., -1., -1., -1., -1., -1.]) >>> env.action_spec.space.high tensor([1., 1., 1., 1., 1., 1., 1.]) >>> # dataset-statistics-driven normalization (no env required): the >>> # forward pass maps raw actions to the normalized space >>> from tensordict import TensorDict >>> t = ActionScaling.from_stats( ... mean=torch.tensor([1.0, 2.0]), std=torch.tensor([2.0, 4.0]) ... ) >>> td = TensorDict({"action": torch.tensor([[3.0, 6.0]])}, batch_size=[1]) >>> t(td)["action"] tensor([[1., 1.]]) >>> # on a replay buffer, a forward-only instance (in_keys_inv=[]) >>> # normalizes on sample and leaves data written through extend >>> # untouched (extend applies the inverse pass) >>> from torchrl.data import LazyTensorStorage, TensorDictReplayBuffer >>> t = ActionScaling.from_stats( ... mean=torch.tensor([1.0, 2.0]), ... std=torch.tensor([2.0, 4.0]), ... in_keys_inv=[], ... ) >>> rb = TensorDictReplayBuffer( ... storage=LazyTensorStorage(10), transform=t, batch_size=2 ... ) >>> raw = TensorDict( ... {"action": torch.tensor([[3.0, 6.0]]).expand(10, 2)}, batch_size=[10] ... ) >>> indices = rb.extend(raw) # stored as-is >>> rb.sample()["action"] # normalized with the dataset statistics tensor([[1., 1.], [1., 1.]]) >>> # the same affine map is exposed on raw tensors for execution-time >>> # use, e.g. mapping a policy's normalized prediction to the robot >>> t.denormalize(torch.tensor([[1.0, 1.0]])) tensor([[3., 6.]])
- denormalize(action: Tensor) Tensor[source]#
Map a normalized action back to the env scale (the inverse map).
- classmethod from_metadata(metadata: RobotDatasetMetadata, **kwargs) ActionScaling[source]#
Build from the action statistics of a
RobotDatasetMetadata.Uses
action_mean/action_stdwhen available, falling back toaction_low/action_high. The action key defaults to the metadata’saction_key.
- classmethod from_stats(*, mean: Tensor | None = None, std: Tensor | None = None, low: Tensor | None = None, high: Tensor | None = None, eps: float = 1e-06, **kwargs) ActionScaling[source]#
Build an
ActionScalingfrom dataset action statistics.Provide exactly one complete pair:
meanandstd(zero-mean, unit-std normalized space) orlowandhigh(maps the range to[-1, 1]).- Keyword Arguments:
mean (torch.Tensor, optional) – per-dimension action mean.
std (torch.Tensor, optional) – per-dimension action std.
low (torch.Tensor, optional) – per-dimension action minimum.
high (torch.Tensor, optional) – per-dimension action maximum.
eps (float, optional) – floor applied to the scale to avoid division by zero on constant action dimensions. Defaults to
1e-6.**kwargs – forwarded to the constructor (e.g.
in_keys_inv,standard_normal).
- normalize(action: Tensor) Tensor[source]#
Map an env-scale action to the normalized space (the forward map).
- transform_action_spec(action_spec: TensorSpec) TensorSpec[source]#
Transforms the action spec such that the resulting spec matches transform mapping.
- Parameters:
action_spec (TensorSpec) – spec before the transform
- Returns:
expected spec after the transform