.. currentmodule:: torchrl.trainers.algorithms.configs
TorchRL Configuration System
============================
TorchRL provides a powerful configuration system built on top of `Hydra `_ that enables you to easily configure
and run reinforcement learning experiments. This system uses structured dataclass-based configurations that can be composed, overridden, and extended.
The advantages of using a configuration system are:
- Quick and easy to get started: provide your task and let the system handle the rest
- Get a glimpse of the available options and their default values in one go: ``python sota-implementations/ppo_trainer/train.py --help`` will show you all the available options and their default values
- Easy to override and extend: you can override any option in the configuration file, and you can also extend the configuration file with your own custom configurations
- Easy to share and reproduce: you can share your configuration file with others, and they can reproduce your results by simply running the same command.
- Easy to version control: you can easily version control your configuration file
Quick Start with a Simple Example
----------------------------------
Let's start with a simple example that creates a Gym environment. Here's a minimal configuration file:
.. code-block:: yaml
# config.yaml
defaults:
- env@training_env: gym
training_env:
env_name: CartPole-v1
This configuration has two main parts:
**1. The** ``defaults`` **section**
The ``defaults`` section tells Hydra which configuration groups to include. In this case:
- ``env@training_env: gym`` means "use the 'gym' configuration from the 'env' group for the 'training_env' target"
This is equivalent to including a predefined configuration for Gym environments, which sets up the proper target class and default parameters.
**2. The configuration override**
The ``training_env`` section allows you to override or specify parameters for the selected configuration:
- ``env_name: CartPole-v1`` sets the specific environment name
Configuration Categories and Groups
-----------------------------------
TorchRL organizes configurations into several categories using the ``@`` syntax for targeted configuration:
- ``env@``: Environment configurations (Gym, DMControl, Brax, etc.) as well as batched environments
- ``transform@``: Transform configurations (observation/reward processing)
- ``model@``: Model configurations (policy and value networks)
- ``network@``: Neural network configurations (MLP, ConvNet)
- ``collector@``: Data collection configurations
- ``replay_buffer@``: Replay buffer configurations
- ``storage@``: Storage backend configurations
- ``sampler@``: Sampling strategy configurations
- ``writer@``: Writer strategy configurations
- ``trainer@``: Training loop configurations
- ``optimizer@``: Optimizer configurations
- ``loss@``: Loss function configurations
- ``logger@``: Logging configurations
The ``@`` syntax allows you to assign configurations to specific locations in your config structure.
More Complex Example: Parallel Environment with Transforms
-----------------------------------------------------------
Here's a more complex example that creates a parallel environment with multiple transforms applied to each worker:
.. code-block:: yaml
defaults:
- env@training_env: batched_env
- env@training_env.create_env_fn: transformed_env
- env@training_env.create_env_fn.base_env: gym
- transform@training_env.create_env_fn.transform: compose
- transform@transform0: noop_reset
- transform@transform1: step_counter
# Transform configurations
transform0:
noops: 30
random: true
transform1:
max_steps: 200
step_count_key: "step_count"
# Environment configuration
training_env:
num_workers: 4
create_env_fn:
base_env:
env_name: Pendulum-v1
transform:
transforms:
- ${transform0}
- ${transform1}
_partial_: true
**What this configuration creates:**
This configuration builds a **parallel environment with 4 workers**, where each worker runs a **Pendulum-v1 environment with two transforms applied**:
1. **Parallel Environment Structure**:
- ``batched_env`` creates a parallel environment that runs multiple environment instances
- ``num_workers: 4`` means 4 parallel environment processes
2. **Individual Environment Construction** (repeated for each of the 4 workers):
- **Base Environment**: ``gym`` with ``env_name: Pendulum-v1`` creates a Pendulum environment
- **Transform Layer 1**: ``noop_reset`` performs 30 random no-op actions at episode start
- **Transform Layer 2**: ``step_counter`` limits episodes to 200 steps and tracks step count
- **Transform Composition**: ``compose`` combines both transforms into a single transformation
3. **Final Result**: 4 parallel Pendulum environments, each with:
- Random no-op resets (0-30 actions at start)
- Maximum episode length of 200 steps
- Step counting functionality
**Key Configuration Concepts:**
1. **Nested targeting**: ``env@training_env.create_env_fn.base_env: gym`` places a gym config deep inside the structure
2. **Function factories**: ``_partial_: true`` creates a function that can be called multiple times (once per worker)
3. **Transform composition**: Multiple transforms are combined and applied to each environment instance
4. **Variable interpolation**: ``${transform0}`` and ``${transform1}`` reference the separately defined transform configurations
Getting Available Options
--------------------------
To explore all available configurations and their parameters, one can use the ``--help`` flag with any TorchRL script:
.. code-block:: bash
python sota-implementations/ppo_trainer/train.py --help
This shows all configuration groups and their options, making it easy to discover what's available. It should print something like this:
.. code-block:: bash
Complete Training Example
--------------------------
Here's a complete configuration for PPO training:
.. code-block:: yaml
defaults:
- env@training_env: batched_env
- env@training_env.create_env_fn: gym
- model@models.policy_model: tanh_normal
- model@models.value_model: value
- network@networks.policy_network: mlp
- network@networks.value_network: mlp
- collector: sync
- replay_buffer: base
- storage: tensor
- sampler: without_replacement
- writer: round_robin
- trainer: ppo
- optimizer: adam
- loss: ppo
- logger: wandb
# Network configurations
networks:
policy_network:
out_features: 2
in_features: 4
num_cells: [128, 128]
value_network:
out_features: 1
in_features: 4
num_calls: [128, 128]
# Model configurations
models:
policy_model:
network: ${networks.policy_network}
in_keys: ["observation"]
out_keys: ["action"]
value_model:
network: ${networks.value_network}
in_keys: ["observation"]
out_keys: ["state_value"]
# Environment
training_env:
num_workers: 2
create_env_fn:
env_name: CartPole-v1
_partial_: true
# Training components
trainer:
collector: ${collector}
optimizer: ${optimizer}
loss_module: ${loss}
logger: ${logger}
total_frames: 100000
collector:
create_env_fn: ${training_env}
policy: ${models.policy_model}
frames_per_batch: 1024
optimizer:
lr: 0.001
loss:
actor_network: ${models.policy_model}
critic_network: ${models.value_model}
logger:
exp_name: my_experiment
Running Experiments
--------------------
Basic Usage
~~~~~~~~~~~
.. code-block:: bash
# Use default configuration
python sota-implementations/ppo_trainer/train.py
# Override specific parameters
python sota-implementations/ppo_trainer/train.py optimizer.lr=0.0001
# Change environment
python sota-implementations/ppo_trainer/train.py training_env.create_env_fn.env_name=Pendulum-v1
# Use different collector
python sota-implementations/ppo_trainer/train.py collector=async
Hyperparameter Sweeps
~~~~~~~~~~~~~~~~~~~~~
.. code-block:: bash
# Sweep over learning rates
python sota-implementations/ppo_trainer/train.py --multirun optimizer.lr=0.0001,0.001,0.01
# Multiple parameter sweep
python sota-implementations/ppo_trainer/train.py --multirun \
optimizer.lr=0.0001,0.001 \
training_env.num_workers=2,4,8
Custom Configuration Files
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: bash
# Use custom config file
python sota-implementations/ppo_trainer/train.py --config-name my_custom_config
Configuration Store Implementation Details
------------------------------------------
Under the hood, TorchRL uses Hydra's ConfigStore to register all configuration classes. This provides type safety, validation, and IDE support. The registration happens automatically when you import the configs module:
.. code-block:: python
from hydra.core.config_store import ConfigStore
from torchrl.trainers.algorithms.configs import *
cs = ConfigStore.instance()
# Environments
cs.store(group="env", name="gym", node=GymEnvConfig)
cs.store(group="env", name="batched_env", node=BatchedEnvConfig)
# Models
cs.store(group="model", name="tanh_normal", node=TanhNormalModelConfig)
# ... and many more
Available Configuration Classes
-------------------------------
Base Classes
~~~~~~~~~~~~
.. currentmodule:: torchrl.trainers.algorithms.configs.common
.. autosummary::
:toctree: generated/
:template: rl_template_class.rst
ConfigBase
Environment Configurations
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. currentmodule:: torchrl.trainers.algorithms.configs.envs
.. autosummary::
:toctree: generated/
:template: rl_template_class.rst
EnvConfig
BatchedEnvConfig
TransformedEnvConfig
Environment Library Configurations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. currentmodule:: torchrl.trainers.algorithms.configs.envs_libs
.. autosummary::
:toctree: generated/
:template: rl_template_class.rst
EnvLibsConfig
GymEnvConfig
DMControlEnvConfig
BraxEnvConfig
HabitatEnvConfig
IsaacGymEnvConfig
JumanjiEnvConfig
MeltingpotEnvConfig
MOGymEnvConfig
MultiThreadedEnvConfig
OpenMLEnvConfig
OpenSpielEnvConfig
PettingZooEnvConfig
RoboHiveEnvConfig
SMACv2EnvConfig
UnityMLAgentsEnvConfig
VmasEnvConfig
Model and Network Configurations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. currentmodule:: torchrl.trainers.algorithms.configs.modules
.. autosummary::
:toctree: generated/
:template: rl_template_class.rst
ModelConfig
NetworkConfig
MLPConfig
ConvNetConfig
TensorDictModuleConfig
TanhNormalModelConfig
ValueModelConfig
Transform Configurations
~~~~~~~~~~~~~~~~~~~~~~~~
.. currentmodule:: torchrl.trainers.algorithms.configs.transforms
.. autosummary::
:toctree: generated/
:template: rl_template_class.rst
TransformConfig
ComposeConfig
NoopResetEnvConfig
StepCounterConfig
DoubleToFloatConfig
ToTensorImageConfig
ClipTransformConfig
ResizeConfig
CenterCropConfig
CropConfig
FlattenObservationConfig
GrayScaleConfig
ObservationNormConfig
CatFramesConfig
RewardClippingConfig
RewardScalingConfig
BinarizeRewardConfig
TargetReturnConfig
VecNormConfig
FrameSkipTransformConfig
DeviceCastTransformConfig
DTypeCastTransformConfig
UnsqueezeTransformConfig
SqueezeTransformConfig
PermuteTransformConfig
CatTensorsConfig
StackConfig
DiscreteActionProjectionConfig
TensorDictPrimerConfig
PinMemoryTransformConfig
RewardSumConfig
ExcludeTransformConfig
SelectTransformConfig
TimeMaxPoolConfig
RandomCropTensorDictConfig
InitTrackerConfig
RenameTransformConfig
Reward2GoTransformConfig
ActionMaskConfig
VecGymEnvTransformConfig
BurnInTransformConfig
SignTransformConfig
RemoveEmptySpecsConfig
BatchSizeTransformConfig
AutoResetTransformConfig
ActionDiscretizerConfig
TrajCounterConfig
LineariseRewardsConfig
ConditionalSkipConfig
MultiActionConfig
TimerConfig
ConditionalPolicySwitchConfig
FiniteTensorDictCheckConfig
UnaryTransformConfig
HashConfig
TokenizerConfig
EndOfLifeTransformConfig
MultiStepTransformConfig
KLRewardTransformConfig
R3MTransformConfig
VC1TransformConfig
VIPTransformConfig
VIPRewardTransformConfig
VecNormV2Config
Data Collection Configurations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. currentmodule:: torchrl.trainers.algorithms.configs.collectors
.. autosummary::
:toctree: generated/
:template: rl_template_class.rst
DataCollectorConfig
SyncDataCollectorConfig
AsyncDataCollectorConfig
MultiSyncDataCollectorConfig
MultiaSyncDataCollectorConfig
Replay Buffer and Storage Configurations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. currentmodule:: torchrl.trainers.algorithms.configs.data
.. autosummary::
:toctree: generated/
:template: rl_template_class.rst
ReplayBufferConfig
TensorDictReplayBufferConfig
RandomSamplerConfig
SamplerWithoutReplacementConfig
PrioritizedSamplerConfig
SliceSamplerConfig
SliceSamplerWithoutReplacementConfig
ListStorageConfig
TensorStorageConfig
LazyTensorStorageConfig
LazyMemmapStorageConfig
LazyStackStorageConfig
StorageEnsembleConfig
RoundRobinWriterConfig
StorageEnsembleWriterConfig
Training and Optimization Configurations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. currentmodule:: torchrl.trainers.algorithms.configs.trainers
.. autosummary::
:toctree: generated/
:template: rl_template_class.rst
TrainerConfig
PPOTrainerConfig
.. currentmodule:: torchrl.trainers.algorithms.configs.objectives
.. autosummary::
:toctree: generated/
:template: rl_template_class.rst
LossConfig
PPOLossConfig
.. currentmodule:: torchrl.trainers.algorithms.configs.utils
.. autosummary::
:toctree: generated/
:template: rl_template_class.rst
AdamConfig
AdamWConfig
AdamaxConfig
AdadeltaConfig
AdagradConfig
ASGDConfig
LBFGSConfig
LionConfig
NAdamConfig
RAdamConfig
RMSpropConfig
RpropConfig
SGDConfig
SparseAdamConfig
Logging Configurations
~~~~~~~~~~~~~~~~~~~~~
.. currentmodule:: torchrl.trainers.algorithms.configs.logging
.. autosummary::
:toctree: generated/
:template: rl_template_class.rst
LoggerConfig
WandbLoggerConfig
TensorboardLoggerConfig
CSVLoggerConfig
Creating Custom Configurations
------------------------------
You can create custom configuration classes by inheriting from the appropriate base classes:
.. code-block:: python
from dataclasses import dataclass
from torchrl.trainers.algorithms.configs.envs_libs import EnvLibsConfig
@dataclass
class MyCustomEnvConfig(EnvLibsConfig):
_target_: str = "my_module.MyCustomEnv"
env_name: str = "MyEnv-v1"
custom_param: float = 1.0
def __post_init__(self):
super().__post_init__()
# Register with ConfigStore
from hydra.core.config_store import ConfigStore
cs = ConfigStore.instance()
cs.store(group="env", name="my_custom", node=MyCustomEnvConfig)
Best Practices
--------------
1. **Start Simple**: Begin with basic configurations and gradually add complexity
2. **Use Defaults**: Leverage the ``defaults`` section to compose configurations
3. **Override Sparingly**: Only override what you need to change
4. **Validate Configurations**: Test that your configurations instantiate correctly
5. **Version Control**: Keep your configuration files under version control
6. **Use Variable Interpolation**: Use ``${variable}`` syntax to avoid duplication
Future Extensions
-----------------
As TorchRL adds more algorithms beyond PPO (such as SAC, TD3, DQN), the configuration system will expand with:
- New trainer configurations (e.g., ``SACTrainerConfig``, ``TD3TrainerConfig``)
- Algorithm-specific loss configurations
- Specialized collector configurations for different algorithms
- Additional environment and model configurations
The modular design ensures easy integration while maintaining backward compatibility.