.. currentmodule:: torchrl

IsaacLab Integration
====================

.. _ref_isaaclab:

This guide covers how to use TorchRL components with
`IsaacLab <https://isaac-sim.github.io/IsaacLab/v2.3.0/>`_
(NVIDIA's GPU-accelerated robotics simulation platform).

For general IsaacLab installation and cluster setup (not specific to TorchRL), see the
`knowledge_base/ISAACLAB.md <https://github.com/pytorch/rl/blob/main/knowledge_base/ISAACLAB.md>`_ file.

IsaacLabWrapper
---------------

Use :class:`~torchrl.envs.libs.isaac_lab.IsaacLabWrapper` to wrap a gymnasium
IsaacLab environment into a TorchRL-compatible :class:`~torchrl.envs.EnvBase`:

.. code-block:: python

    import gymnasium as gym
    from torchrl.envs.libs.isaac_lab import IsaacLabWrapper

    env = gym.make("Isaac-Ant-v0", cfg=env_cfg)
    env = IsaacLabWrapper(env)

Key defaults:

- ``device=cuda:0``
- ``allow_done_after_reset=True``  (IsaacLab can report done immediately after reset)
- ``convert_actions_to_numpy=False``  (actions stay as tensors)

.. note::

    IsaacLab modifies ``terminated`` and ``truncated`` tensors in-place.
    ``IsaacLabWrapper`` clones these tensors to prevent data corruption.

.. note::

    Batched specs: IsaacLab env specs include the batch dimension (e.g., shape
    ``(4096, obs_dim)``).  Use ``*_spec_unbatched`` properties when you need
    per-env shapes.

.. note::

    Reward shape: IsaacLab rewards are ``(num_envs,)``.  The wrapper
    unsqueezes to ``(num_envs, 1)`` for TorchRL compatibility.

Collector
---------

Because IsaacLab environments are **pre-vectorized** (a single ``gym.make``
creates ~4096 parallel environments on the GPU), use a single
:class:`~torchrl.collectors.Collector` — there is no need for
``ParallelEnv`` or ``MultiCollector``:

.. code-block:: python

    from torchrl.collectors import Collector

    collector = Collector(
        create_env_fn=env,
        policy=policy,
        frames_per_batch=40960,   # 10 env steps * 4096 envs
        storing_device="cpu",
        no_cuda_sync=True,        # IMPORTANT for CUDA envs
    )

- ``no_cuda_sync=True``: avoids unnecessary CUDA synchronisation that can
  cause hangs with GPU-native environments.
- ``storing_device="cpu"``: moves collected data to CPU for the replay buffer.

2-GPU Async Pipeline
~~~~~~~~~~~~~~~~~~~~

For maximum throughput, use two GPUs with a background collection thread:

- **GPU 0 (``sim_device``)**: IsaacLab simulation + collection policy
  inference
- **GPU 1 (``train_device``)**: Model training (world model, actor, value
  gradients)

.. code-block:: python

    import copy, threading
    from tensordict import TensorDict

    # Deep copy policy to sim_device for collection
    collector_policy = copy.deepcopy(policy).to(sim_device)

    # Background thread for continuous collection
    def collect_loop(collector, replay_buffer, stop_event):
        for data in collector:
            replay_buffer.extend(data)
            if stop_event.is_set():
                break

    # Main thread: train on train_device
    for optim_step in range(total_steps):
        batch = replay_buffer.sample()
        train(batch)  # all on cuda:1
        # Periodic weight sync: training policy -> collector policy
        if optim_step % sync_every == 0:
            weights = TensorDict.from_module(policy)
            collector.update_policy_weights_(weights)

Key points:

- Both CUDA operations release the GIL, so they truly overlap.
- Must pass ``TensorDict.from_module(policy)`` to
  ``update_policy_weights_()``, not the module itself.
- Set ``CUDA_VISIBLE_DEVICES=0,1`` to expose 2 GPUs (IsaacLab defaults to
  only GPU 0).
- Falls back gracefully to single-GPU if only 1 GPU is available.

RayCollector (alternative)
~~~~~~~~~~~~~~~~~~~~~~~~~~

If you need distributed collection across multiple GPUs/nodes, use
:class:`~torchrl.collectors.distributed.RayCollector`:

.. code-block:: python

    from torchrl.collectors.distributed import RayCollector

    collector = RayCollector(
        [make_env] * num_collectors,
        policy,
        frames_per_batch=8192,
        collector_kwargs={
            "trust_policy": True,
            "no_cuda_sync": True,
        },
    )

Replay Buffer
-------------

The :class:`~torchrl.data.SliceSampler` needs enough sequential data.  With
``batch_length=50``, you need at least 50 time steps per trajectory before
sampling::

    init_random_frames >= batch_length * num_envs
                        = 50 * 4096
                        = 204,800

For GPU-resident replay buffers, use
:class:`~torchrl.data.LazyTensorStorage` with the target CUDA device.
This avoids CPU→GPU transfer at sample time (but adds it at extend time).

TorchRL-Specific Gotchas
------------------------

1. **``no_cuda_sync=True``**: Always set this for collectors with CUDA
   environments.  Without it, you get mysterious hangs.

2. **Installing torchrl in Isaac container**: Use
   ``--no-build-isolation --no-deps`` to avoid conflicts with Isaac's
   pre-installed torch/numpy.

3. **``TensorDictPrimer`` ``expand_specs``**: When adding primers (e.g.,
   ``state``, ``belief``) to a pre-vectorized env, you MUST pass
   ``expand_specs=True`` to :class:`~torchrl.envs.TensorDictPrimer`.
   Otherwise the primer shapes ``()`` conflict with the env's ``batch_size``
   ``(4096,)``.

4. **Model-based env spec double-batching**:
   ``model_based_env.set_specs_from_env(batched_env)`` copies specs with batch
   dims baked in.  The model-based env then double-batches actions during
   sampling (e.g., ``(4096, 4096, 8)`` instead of ``(4096, 8)``).

   **Fix**: unbatch the model-based env's specs after copying:

   .. code-block:: python

       model_based_env.set_specs_from_env(test_env)
       if test_env.batch_size:
           idx = (0,) * len(test_env.batch_size)
           model_based_env.__dict__["_output_spec"] = (
               model_based_env.__dict__["_output_spec"][idx]
           )
           model_based_env.__dict__["_input_spec"] = (
               model_based_env.__dict__["_input_spec"][idx]
           )
           model_based_env.empty_cache()

5. **``torch.compile`` with TensorDict**: Compiling full loss modules crashes
   because dynamo traces through TensorDict internals.  **Fix**: compile
   individual MLP sub-modules (encoder, decoder, reward_model, value_model)
   with ``torch._dynamo.config.suppress_errors = True``.  Do NOT compile RSSM
   (sequential, shared with collector) or loss modules (heavy TensorDict use).

6. **``SliceSampler`` with ``strict_length=False``**: The sampler may return
   fewer elements than ``batch_size``.  This causes
   ``reshape(-1, batch_length)`` to fail.

   **Fix**: truncate the sample:

   .. code-block:: python

       sample = replay_buffer.sample()
       numel = sample.numel()
       usable = (numel // batch_length) * batch_length
       if usable < numel:
           sample = sample[:usable]
       sample = sample.reshape(-1, batch_length)

7. **``frames_per_batch`` vs ``batch_length``**: Each collection adds
   ``frames_per_batch / num_envs`` time steps per env.  The
   ``SliceSampler`` needs contiguous sequences of at least ``batch_length``
   steps within a single trajectory.  Ensure
   ``frames_per_batch >= batch_length * num_envs`` for the initial collection,
   or that ``init_random_frames >= batch_length * num_envs``.

8. **``TD_GET_DEFAULTS_TO_NONE``**: Set this environment variable to ``1``
   when running inside the Isaac container to ensure correct TensorDict
   default behavior.