Rate this Page

RandomPolicy#

class torchrl.modules.tensordict_module.RandomPolicy(action_spec: TensorSpec | None = None, action_key: NestedKey = 'action')[source]#

A random policy for data collectors.

This is a wrapper around the action_spec.rand method.

Parameters:
  • action_spec – TensorSpec object describing the action specs. If None, the spec is initialized lazily (e.g. by a collector from env.full_action_spec). A RandomPolicy with no spec will raise if called before the spec is set.

  • action_key – key at which the action is written. Defaults to "action".

Examples

>>> from tensordict import TensorDict
>>> from torchrl.data.tensor_specs import Bounded
>>> action_spec = Bounded(-torch.ones(3), torch.ones(3))
>>> actor = RandomPolicy(action_spec=action_spec)
>>> td = actor(TensorDict()) # selects a random action in the cube [-1; 1]

Lazy initialization — let the collector fill in the spec from the env:

>>> from torchrl.collectors import Collector
>>> collector = Collector(env, RandomPolicy(), ...)  
set_action_spec_from_env(env: EnvBase) None[source]#

Initialize action_spec from env.full_action_spec.

No-op if the spec is already set. Intended for lazy initialization by data collectors.