Glossary#
TorchRL borrows much of its vocabulary from tensordict and the broader
RL literature, but a handful of terms appear in error messages and source code
without a dedicated definition in the API reference. This page lists those
terms with the minimum context needed to find the relevant code.
- _AcceptedKeys#
A dataclass nested inside most
LossModulesubclasses that declares the tensordict keys the loss expects to read or write. Each field is aNestedKeywith a default value. Override the defaults viaset_keys()rather than mutating the dataclass directly;set_keysalso propagates the change to the underlying value estimator.- BatchedEnv#
A TorchRL environment that owns more than one environment instance under a single
EnvBaseinterface. The common implementations areSerialEnvandParallelEnv, both subclasses of the internalBatchedEnvBase. Theirbatch_sizeis the leading shape of reset, step, and collector outputs.- carrier#
The
TensorDictBasestored asself._carrierinsiderollout(). It persists across collector batches and holds the post-reset environment output that the next policy call consumes. See Collector Internals for the full lifecycle.- Collector#
The single-process data collector, exposed as
Collector. It alternates policy calls and environment steps to produce rollout tensordicts.- compact_obs#
Collector setting that drops observation and state keys from the
("next", ...)sub-tensordict of every persisted step. Within a contiguous same-trajectory sample, those values can be reconstructed from the root keys of the following step. At trajectory boundaries or in non-contiguous random samples, reconstruction must use the configured fill value; seeNextStateReconstructorand thecompact_obsargument onCollector.- Composite#
- CompositeSpec#
A nested spec container, currently named
Composite, that maps tensordict keys to leafTensorSpecobjects. Environment specs such asobservation_spec,action_spec, andreward_specare usually composites.CompositeSpecis an older name that may still appear in discussions and issue reports.- Env#
Short for environment: an object implementing the
EnvBaseAPI, includingreset,step, specs, device handling, and a tensordict-based input/output contract. TorchRL env wrappers usually subclassTransformor compose aTransformedEnvrather than following the Gym wrapper API directly.- env batch size#
The leading batch shape of an environment, exposed as
batch_size. A single unbatched env has an empty batch size; aParallelEnvwithNworkers usually has batch size[N]. Collectors append a time dimension to this shape when they stack rollout steps.- env_device#
The collector device slot used for environment
resetandstepoperations. When it differs frompolicy_deviceor from the storage layout, the collector inserts the casts and sync points described in Collector Internals.- EnvCreator#
A small callable wrapper,
EnvCreator, used to build environments lazily or in worker processes. It is useful when constructors need to be serialized forMultiSyncCollector,MultiAsyncCollector, or distributed collectors.- functional (loss)#
A
LossModuleis functional when it stores its actor / critic parameters as a stateless tensordict and invokes the networks withto_module()at call time. This is what makes soft / target update,separate_losses=True, and per-parameter optimiser groups possible without deep-copying the underlyingnn.Module. Checkloss.functionalto see which mode a given loss is in.- in_keys#
- out_keys#
The list of tensordict keys a module reads from (
in_keys) and writes to (out_keys). BothTensorDictModuleand most TorchRL loss / value-estimator components expose these as constructor arguments. Modifying them lets you wire a module into a tensordict layout that differs from the defaults; see data_layout for naming conventions.- is_init#
A boolean key (default name:
"is_init") written byInitTrackerimmediately after every env reset. Recurrent modules and advantage estimators read this key to know where trajectories begin so they can zero out stale hidden state or reset the bootstrap target.- no_cuda_sync#
A collector flag that suppresses the explicit CUDA, MPS, or NPU synchronizations inserted after cross-device transfers. Safe to set only when transfers are already correctly ordered or when running pure CPU. Defaults to
False.- policy_device#
The collector device slot where the policy network runs. When it differs from
env_device, the collector casts the carrier before policy and env calls.- recurrent mode#
The flag controlling whether an RNN-bearing module (
LSTMModule,GRUModule) processes a single timestep per call (sequential) or a full(B, T, ...)sequence in one call (recurrent). Toggled via theset_recurrent_modecontext manager. Collectors run in sequential mode; losses run in recurrent mode so the module can split and pad on trajectory boundaries inside a replayed batch.- set_keys#
The public method on
LossModuleand value estimators used to override the default tensordict keys a loss expects. Example:loss.set_keys(value=("agents", "state_value"), action=("agents", "action")). Prefer this over reaching intoloss.tensor_keysdirectly because it also wires changes into the loss’s value estimator if one exists.- Specs#
Tensor constraints that describe valid values, shapes, dtypes, and devices. TorchRL uses
TensorSpecleaves, such asBoundedandUnbounded, andCompositecontainers to validate and generate env inputs and outputs.- storing_device#
The collector device slot where a rollout batch is materialised before it is yielded or extended into a replay buffer. Direct
replay_buffer.addwrites bypass this materialisation path.- TED#
TorchRL Episode Data: the standard offline dataset layout described in Datasets. It stores a transition with root keys for the current step and a
("next", ...)sub-tensordict for next-step values. Conversion helpers such asTED2FlatandFlat2TEDserialize and restore this layout.- tensor_keys#
The instance attribute on every
LossModuleholding the current values of the keys declared in_AcceptedKeys. Read-only by convention; useset_keys()to modify them.- TensorDictPrimer#
A
Transformthat injects keys into the environment’s reset / step output that the policy needs but the env does not natively produce, most commonly RNN hidden states. Without a primer, the first call to a recurrent policy after reset would have no hidden state to read. SeeTensorDictPrimerandtorchrl.modules.LSTMModule.make_tensordict_primer().- trajectory ID#
An integer that uniquely identifies which trajectory each frame belongs to. Written by
Collectoras("collector", "traj_ids")whentrack_traj_ids=True. Used bySliceSamplerto draw whole trajectories from a buffer and bysplit_trajectories()to slice a flat batch into per-trajectory chunks.- Transform#
TorchRL’s tensordict-native environment transform abstraction,
Transform. A transform can modify input specs, output specs, reset data, step data, or inverse action data, and is usually installed throughTransformedEnv. This is distinct from a Gym wrapper, which operates on non-tensordict values.
See also#
ref_data_layout — naming conventions for keys in collected batches
Collector Internals — where carrier / sync / device flags appear in the rollout loop
Knowledge Base — longer-form debugging notes