LineariseRewards¶

class torchrl.envs.transforms.LineariseRewards(in_keys: Sequence[NestedKey], out_keys: Sequence[NestedKey] | None = None, *, weights: Sequence[float] | Tensor | None = None)[source]¶

Transforms a multi-objective reward signal to a single-objective one via a weighted sum.

Parameters:

in_keys (List[NestedKey]) – The keys under which the multi-objective rewards are found.
out_keys (List[NestedKey], optional) – The keys under which single-objective rewards should be written. Defaults to in_keys.
weights (List[float], Tensor, optional) – Dictates how to weight each reward when summing them. Defaults to [1.0, 1.0, …].

Warning

If a sequence of in_keys of length strictly greater than one is passed (e.g. one group for each agent in a multi-agent set-up), the same weights will be applied for each entry. If you need to aggregate rewards differently for each group, use several LineariseRewards in a row.

Example

>>> import mo_gymnasium as mo_gym
>>> from torchrl.envs import MOGymWrapper
>>> mo_env = MOGymWrapper(mo_gym.make("deep-sea-treasure-v0"))
>>> mo_env.reward_spec
BoundedContinuous(
    shape=torch.Size([2]),
    space=ContinuousBox(
    low=Tensor(shape=torch.Size([2]), device=cpu, dtype=torch.float32, contiguous=True),
    high=Tensor(shape=torch.Size([2]), device=cpu, dtype=torch.float32, contiguous=True)),
    ...)
>>> so_env = TransformedEnv(mo_env, LineariseRewards(in_keys=("reward",)))
>>> so_env.reward_spec
BoundedContinuous(
    shape=torch.Size([1]),
    space=ContinuousBox(
        low=Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.float32, contiguous=True),
        high=Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.float32, contiguous=True)),
    ...)
>>> td = so_env.rollout(5)
>>> td["next", "reward"].shape
torch.Size([5, 1])

transform_reward_spec(reward_spec: TensorSpec) → TensorSpec[source]¶

Transforms the reward spec such that the resulting spec matches transform mapping.

Parameters:: reward_spec (TensorSpec) – spec before the transform
Returns:: expected spec after the transform

LineariseRewards¶

Docs

Tutorials

Resources