RNDTransform#
- class torchrl.envs.transforms.RNDTransform(target_network: Module, predictor_network: Module, in_keys: list[NestedKey] | None = None, out_keys: list[NestedKey] | None = None, normalize_obs: bool = True, normalize_reward: bool = True, obs_clip: float = 5.0, reward_clip: float = 5.0)[source]#
Random Network Distillation transform that computes an intrinsic reward.
Implements the exploration bonus from:
Burda et al., “Exploration by Random Network Distillation” (2018). https://arxiv.org/abs/1810.12894
At every environment step the transform:
Optionally normalizes the next observation with online running statistics and clips the result to
[-obs_clip, obs_clip]sigma.Passes the (normalized) observation through both the frozen target and the trainable predictor networks.
Writes the MSE prediction error as an intrinsic reward under
out_keys[0].Optionally normalizes that reward by its running standard deviation.
The predictor is only given gradient updates through
RNDLossduring training. The transform itself always runs undertorch.no_grad().Running normalization statistics are lazily initialized on the first step so that the feature dimensionality does not need to be specified up-front. Pass
normalize_obs=Falseto skip observation normalization (useful when the observation is already normalized by another transform).- Parameters:
target_network (torch.nn.Module) – frozen random network providing fixed embeddings. Its parameters are frozen on construction.
predictor_network (torch.nn.Module) – trainable network that learns to predict target embeddings.
in_keys (list of NestedKey, optional) – tensordict keys to read observations from. Defaults to
["observation"].out_keys (list of NestedKey, optional) – tensordict keys to write the intrinsic reward to. Defaults to
["intrinsic_reward"].normalize_obs (bool, optional) – normalize observations with running mean/std before passing to the networks. Default:
True.normalize_reward (bool, optional) – divide intrinsic reward by its running standard deviation. Default:
True.obs_clip (float, optional) – clip normalized observations to
[-obs_clip, obs_clip]. Default:5.0.reward_clip (float, optional) – clip normalized intrinsic reward to
[-reward_clip, reward_clip]. Default:5.0.
Examples
>>> import torch.nn as nn >>> from torchrl.envs import GymEnv, TransformedEnv >>> from torchrl.envs.transforms import RNDTransform >>> target = nn.Sequential(nn.Linear(4, 64), nn.ReLU(), nn.Linear(64, 64)) >>> predictor = nn.Sequential(nn.Linear(4, 64), nn.ReLU(), nn.Linear(64, 64)) >>> env = TransformedEnv(GymEnv("CartPole-v1"), RNDTransform(target, predictor)) >>> td = env.rollout(3) >>> td["next", "intrinsic_reward"].shape torch.Size([3, 1])
- property obs_rms: RunningMeanStd | None#
Running obs statistics, or
Nonebefore the first step.
- property reward_rms: RunningMeanStd | None#
Running intrinsic-reward statistics, or
Nonebefore the first step.
- transform_reward_spec(reward_spec)[source]#
Transforms the reward spec such that the resulting spec matches transform mapping.
- Parameters:
reward_spec (TensorSpec) – spec before the transform
- Returns:
expected spec after the transform