RewardSum¶
- class torchrl.envs.transforms.RewardSum(in_keys: collections.abc.Sequence[tensordict._nestedkey.NestedKey] | None = None, out_keys: collections.abc.Sequence[tensordict._nestedkey.NestedKey] | None = None, reset_keys: collections.abc.Sequence[tensordict._nestedkey.NestedKey] | None = None, *, reward_spec: bool = False)[source]¶
Tracks episode cumulative rewards.
This transform accepts a list of tensordict reward keys (i.e. ‘in_keys’) and tracks their cumulative value along the time dimension for each episode.
When called, the transform writes a new tensordict entry for each
in_keynamedepisode_{in_key}where the cumulative values are written.- Parameters:
in_keys (list of NestedKeys, optional) – Input reward keys. All ‘in_keys’ should be part of the environment reward_spec. If no
in_keysare specified, this transform assumes"reward"to be the input key. However, multiple rewards (e.g."reward1"and"reward2"") can also be specified.out_keys (list of NestedKeys, optional) – The output sum keys, should be one per each input key.
reset_keys (list of NestedKeys, optional) – the list of reset_keys to be used, if the parent environment cannot be found. If provided, this value will prevail over the environment
reset_keys.
- Keyword Arguments:
reward_spec (bool, optional) – if
True, the new reward entry will be registered in the reward specs. Defaults toFalse(registered inobservation_specs).
Examples
>>> from torchrl.envs.transforms import RewardSum, TransformedEnv >>> from torchrl.envs.libs.gym import GymEnv >>> env = TransformedEnv(GymEnv("CartPole-v1"), RewardSum()) >>> env.set_seed(0) >>> torch.manual_seed(0) >>> td = env.reset() >>> print(td["episode_reward"]) tensor([0.]) >>> td = env.rollout(3) >>> print(td["next", "episode_reward"]) tensor([[1.], [2.], [3.]])
- forward(tensordict: TensorDictBase) TensorDictBase[source]¶
Reads the input tensordict, and for the selected keys, applies the transform.
By default, this method:
calls directly
_apply_transform().does not call
_step()or_call().
This method is not called within env.step at any point. However, is is called within
sample().Note
forwardalso works with regular keyword arguments usingdispatchto cast the args names to the keys.Examples
>>> class TransformThatMeasuresBytes(Transform): ... '''Measures the number of bytes in the tensordict, and writes it under `"bytes"`.''' ... def __init__(self): ... super().__init__(in_keys=[], out_keys=["bytes"]) ... ... def forward(self, tensordict: TensorDictBase) -> TensorDictBase: ... bytes_in_td = tensordict.bytes() ... tensordict["bytes"] = bytes ... return tensordict >>> t = TransformThatMeasuresBytes() >>> env = env.append_transform(t) # works within envs >>> t(TensorDict(a=0)) # Works offline too.
- transform_input_spec(input_spec: TensorSpec) TensorSpec[source]¶
Transforms the input spec such that the resulting spec matches transform mapping.
- Parameters:
input_spec (TensorSpec) – spec before the transform
- Returns:
expected spec after the transform
- transform_observation_spec(observation_spec: TensorSpec) TensorSpec[source]¶
Transforms the observation spec, adding the new keys generated by RewardSum.
- transform_reward_spec(reward_spec: TensorSpec) TensorSpec[source]¶
Transforms the reward spec such that the resulting spec matches transform mapping.
- Parameters:
reward_spec (TensorSpec) – spec before the transform
- Returns:
expected spec after the transform