PPOTrainer¶
- class torchrl.trainers.algorithms.PPOTrainer(*args, **kwargs)[source]¶
PPO (Proximal Policy Optimization) trainer implementation.
Warning
This is an experimental/prototype feature. The API may change in future versions. Please report any issues or feedback to help improve this implementation.
This trainer implements the PPO algorithm for training reinforcement learning agents. It extends the base Trainer class with PPO-specific functionality including policy optimization, value function learning, and entropy regularization.
PPO typically uses multiple epochs of optimization on the same batch of data. This trainer defaults to 4 epochs, which is a common choice for PPO implementations.
The trainer includes comprehensive logging capabilities for monitoring training progress: - Training rewards (mean, std, max, total) - Action statistics (norms) - Episode completion rates - Observation statistics (optional)
Logging can be configured via constructor parameters to enable/disable specific metrics.
Examples
>>> # Basic usage with manual configuration >>> from torchrl.trainers.algorithms.ppo import PPOTrainer >>> from torchrl.trainers.algorithms.configs import PPOTrainerConfig >>> from hydra import instantiate >>> config = PPOTrainerConfig(...) # Configure with required parameters >>> trainer = instantiate(config) >>> trainer.train()
Note
This trainer requires a configurable environment setup. See the
configs
module for configuration options.Warning
This is an experimental feature. The API may change in future versions. We welcome feedback and contributions to help improve this implementation!