.. currentmodule:: torchrl IsaacLab Integration ==================== .. _ref_isaaclab: This guide covers how to use TorchRL components with `IsaacLab `_ (NVIDIA's GPU-accelerated robotics simulation platform). For general IsaacLab installation and cluster setup (not specific to TorchRL), see the `knowledge_base/ISAACLAB.md `_ file. IsaacLabWrapper --------------- Use :class:`~torchrl.envs.libs.isaac_lab.IsaacLabWrapper` to wrap a gymnasium IsaacLab environment into a TorchRL-compatible :class:`~torchrl.envs.EnvBase`: .. code-block:: python import gymnasium as gym from torchrl.envs.libs.isaac_lab import IsaacLabWrapper env = gym.make("Isaac-Ant-v0", cfg=env_cfg) env = IsaacLabWrapper(env) Key defaults: - ``device=cuda:0`` - ``allow_done_after_reset=True`` (IsaacLab can report done immediately after reset) - ``convert_actions_to_numpy=False`` (actions stay as tensors) .. note:: IsaacLab modifies ``terminated`` and ``truncated`` tensors in-place. ``IsaacLabWrapper`` clones these tensors to prevent data corruption. .. note:: Batched specs: IsaacLab env specs include the batch dimension (e.g., shape ``(4096, obs_dim)``). Use ``*_spec_unbatched`` properties when you need per-env shapes. .. note:: Reward shape: IsaacLab rewards are ``(num_envs,)``. The wrapper unsqueezes to ``(num_envs, 1)`` for TorchRL compatibility. Collector --------- Because IsaacLab environments are **pre-vectorized** (a single ``gym.make`` creates ~4096 parallel environments on the GPU), use a single :class:`~torchrl.collectors.Collector` — there is no need for ``ParallelEnv`` or ``MultiCollector``: .. code-block:: python from torchrl.collectors import Collector collector = Collector( create_env_fn=env, policy=policy, frames_per_batch=40960, # 10 env steps * 4096 envs storing_device="cpu", no_cuda_sync=True, # IMPORTANT for CUDA envs ) - ``no_cuda_sync=True``: avoids unnecessary CUDA synchronisation that can cause hangs with GPU-native environments. - ``storing_device="cpu"``: moves collected data to CPU for the replay buffer. 2-GPU Async Pipeline ~~~~~~~~~~~~~~~~~~~~ For maximum throughput, use two GPUs with a background collection thread: - **GPU 0 (``sim_device``)**: IsaacLab simulation + collection policy inference - **GPU 1 (``train_device``)**: Model training (world model, actor, value gradients) .. code-block:: python import copy, threading from tensordict import TensorDict # Deep copy policy to sim_device for collection collector_policy = copy.deepcopy(policy).to(sim_device) # Background thread for continuous collection def collect_loop(collector, replay_buffer, stop_event): for data in collector: replay_buffer.extend(data) if stop_event.is_set(): break # Main thread: train on train_device for optim_step in range(total_steps): batch = replay_buffer.sample() train(batch) # all on cuda:1 # Periodic weight sync: training policy -> collector policy if optim_step % sync_every == 0: weights = TensorDict.from_module(policy) collector.update_policy_weights_(weights) Key points: - Both CUDA operations release the GIL, so they truly overlap. - Must pass ``TensorDict.from_module(policy)`` to ``update_policy_weights_()``, not the module itself. - Set ``CUDA_VISIBLE_DEVICES=0,1`` to expose 2 GPUs (IsaacLab defaults to only GPU 0). - Falls back gracefully to single-GPU if only 1 GPU is available. RayCollector (alternative) ~~~~~~~~~~~~~~~~~~~~~~~~~~ If you need distributed collection across multiple GPUs/nodes, use :class:`~torchrl.collectors.distributed.RayCollector`: .. code-block:: python from torchrl.collectors.distributed import RayCollector collector = RayCollector( [make_env] * num_collectors, policy, frames_per_batch=8192, collector_kwargs={ "trust_policy": True, "no_cuda_sync": True, }, ) Replay Buffer ------------- The :class:`~torchrl.data.SliceSampler` needs enough sequential data. With ``batch_length=50``, you need at least 50 time steps per trajectory before sampling:: init_random_frames >= batch_length * num_envs = 50 * 4096 = 204,800 For GPU-resident replay buffers, use :class:`~torchrl.data.LazyTensorStorage` with the target CUDA device. This avoids CPU→GPU transfer at sample time (but adds it at extend time). TorchRL-Specific Gotchas ------------------------ 1. **``no_cuda_sync=True``**: Always set this for collectors with CUDA environments. Without it, you get mysterious hangs. 2. **Installing torchrl in Isaac container**: Use ``--no-build-isolation --no-deps`` to avoid conflicts with Isaac's pre-installed torch/numpy. 3. **``TensorDictPrimer`` ``expand_specs``**: When adding primers (e.g., ``state``, ``belief``) to a pre-vectorized env, you MUST pass ``expand_specs=True`` to :class:`~torchrl.envs.TensorDictPrimer`. Otherwise the primer shapes ``()`` conflict with the env's ``batch_size`` ``(4096,)``. 4. **Model-based env spec double-batching**: ``model_based_env.set_specs_from_env(batched_env)`` copies specs with batch dims baked in. The model-based env then double-batches actions during sampling (e.g., ``(4096, 4096, 8)`` instead of ``(4096, 8)``). **Fix**: unbatch the model-based env's specs after copying: .. code-block:: python model_based_env.set_specs_from_env(test_env) if test_env.batch_size: idx = (0,) * len(test_env.batch_size) model_based_env.__dict__["_output_spec"] = ( model_based_env.__dict__["_output_spec"][idx] ) model_based_env.__dict__["_input_spec"] = ( model_based_env.__dict__["_input_spec"][idx] ) model_based_env.empty_cache() 5. **``torch.compile`` with TensorDict**: Compiling full loss modules crashes because dynamo traces through TensorDict internals. **Fix**: compile individual MLP sub-modules (encoder, decoder, reward_model, value_model) with ``torch._dynamo.config.suppress_errors = True``. Do NOT compile RSSM (sequential, shared with collector) or loss modules (heavy TensorDict use). 6. **``SliceSampler`` with ``strict_length=False``**: The sampler may return fewer elements than ``batch_size``. This causes ``reshape(-1, batch_length)`` to fail. **Fix**: truncate the sample: .. code-block:: python sample = replay_buffer.sample() numel = sample.numel() usable = (numel // batch_length) * batch_length if usable < numel: sample = sample[:usable] sample = sample.reshape(-1, batch_length) 7. **``frames_per_batch`` vs ``batch_length``**: Each collection adds ``frames_per_batch / num_envs`` time steps per env. The ``SliceSampler`` needs contiguous sequences of at least ``batch_length`` steps within a single trajectory. Ensure ``frames_per_batch >= batch_length * num_envs`` for the initial collection, or that ``init_random_frames >= batch_length * num_envs``. 8. **``TD_GET_DEFAULTS_TO_NONE``**: Set this environment variable to ``1`` when running inside the Isaac container to ensure correct TensorDict default behavior.