IsaacLab Integration¶

This guide covers how to use TorchRL components with IsaacLab (NVIDIA’s GPU-accelerated robotics simulation platform).

For general IsaacLab installation and cluster setup (not specific to TorchRL), see the knowledge_base/ISAACLAB.md file.

IsaacLabWrapper¶

Use IsaacLabWrapper to wrap a gymnasium IsaacLab environment into a TorchRL-compatible EnvBase:

import gymnasium as gym
from torchrl.envs.libs.isaac_lab import IsaacLabWrapper

env = gym.make("Isaac-Ant-v0", cfg=env_cfg)
env = IsaacLabWrapper(env)

Key defaults:

device=cuda:0
allow_done_after_reset=True (IsaacLab can report done immediately after reset)
convert_actions_to_numpy=False (actions stay as tensors)

Note

IsaacLab modifies terminated and truncated tensors in-place. IsaacLabWrapper clones these tensors to prevent data corruption.

Note

Batched specs: IsaacLab env specs include the batch dimension (e.g., shape (4096, obs_dim)). Use *_spec_unbatched properties when you need per-env shapes.

Note

Reward shape: IsaacLab rewards are (num_envs,). The wrapper unsqueezes to (num_envs, 1) for TorchRL compatibility.

Collector¶

Because IsaacLab environments are pre-vectorized (a single gym.make creates ~4096 parallel environments on the GPU), use a single Collector — there is no need for ParallelEnv or MultiCollector:

from torchrl.collectors import Collector

collector = Collector(
    create_env_fn=env,
    policy=policy,
    frames_per_batch=40960,   # 10 env steps * 4096 envs
    storing_device="cpu",
    no_cuda_sync=True,        # IMPORTANT for CUDA envs
)

no_cuda_sync=True: avoids unnecessary CUDA synchronisation that can cause hangs with GPU-native environments.
storing_device="cpu": moves collected data to CPU for the replay buffer.

2-GPU Async Pipeline¶

For maximum throughput, use two GPUs with a background collection thread:

GPU 0 (``sim_device``): IsaacLab simulation + collection policy inference
GPU 1 (``train_device``): Model training (world model, actor, value gradients)

import copy, threading
from tensordict import TensorDict

# Deep copy policy to sim_device for collection
collector_policy = copy.deepcopy(policy).to(sim_device)

# Background thread for continuous collection
def collect_loop(collector, replay_buffer, stop_event):
    for data in collector:
        replay_buffer.extend(data)
        if stop_event.is_set():
            break

# Main thread: train on train_device
for optim_step in range(total_steps):
    batch = replay_buffer.sample()
    train(batch)  # all on cuda:1
    # Periodic weight sync: training policy -> collector policy
    if optim_step % sync_every == 0:
        weights = TensorDict.from_module(policy)
        collector.update_policy_weights_(weights)

Key points:

Both CUDA operations release the GIL, so they truly overlap.
Must pass TensorDict.from_module(policy) to update_policy_weights_(), not the module itself.
Set CUDA_VISIBLE_DEVICES=0,1 to expose 2 GPUs (IsaacLab defaults to only GPU 0).
Falls back gracefully to single-GPU if only 1 GPU is available.

RayCollector (alternative)¶

If you need distributed collection across multiple GPUs/nodes, use RayCollector:

from torchrl.collectors.distributed import RayCollector

collector = RayCollector(
    [make_env] * num_collectors,
    policy,
    frames_per_batch=8192,
    collector_kwargs={
        "trust_policy": True,
        "no_cuda_sync": True,
    },
)

Replay Buffer¶

The SliceSampler needs enough sequential data. With batch_length=50, you need at least 50 time steps per trajectory before sampling:

init_random_frames >= batch_length * num_envs
                    = 50 * 4096
                    = 204,800

For GPU-resident replay buffers, use LazyTensorStorage with the target CUDA device. This avoids CPU→GPU transfer at sample time (but adds it at extend time).

TorchRL-Specific Gotchas¶

``no_cuda_sync=True``: Always set this for collectors with CUDA environments. Without it, you get mysterious hangs.
Installing torchrl in Isaac container: Use --no-build-isolation --no-deps to avoid conflicts with Isaac’s pre-installed torch/numpy.
``TensorDictPrimer`` ``expand_specs``: When adding primers (e.g., state, belief) to a pre-vectorized env, you MUST pass expand_specs=True to TensorDictPrimer. Otherwise the primer shapes () conflict with the env’s batch_size (4096,).

Model-based env spec double-batching: model_based_env.set_specs_from_env(batched_env) copies specs with batch dims baked in. The model-based env then double-batches actions during sampling (e.g., (4096, 4096, 8) instead of (4096, 8)).

Fix: unbatch the model-based env’s specs after copying:

model_based_env.set_specs_from_env(test_env)
if test_env.batch_size:
    idx = (0,) * len(test_env.batch_size)
    model_based_env.__dict__["_output_spec"] = (
        model_based_env.__dict__["_output_spec"][idx]
    )
    model_based_env.__dict__["_input_spec"] = (
        model_based_env.__dict__["_input_spec"][idx]
    )
    model_based_env.empty_cache()

``torch.compile`` with TensorDict: Compiling full loss modules crashes because dynamo traces through TensorDict internals. Fix: compile individual MLP sub-modules (encoder, decoder, reward_model, value_model) with torch._dynamo.config.suppress_errors = True. Do NOT compile RSSM (sequential, shared with collector) or loss modules (heavy TensorDict use).

``SliceSampler`` with ``strict_length=False``: The sampler may return fewer elements than batch_size. This causes reshape(-1, batch_length) to fail.

Fix: truncate the sample:

sample = replay_buffer.sample()
numel = sample.numel()
usable = (numel // batch_length) * batch_length
if usable < numel:
    sample = sample[:usable]
sample = sample.reshape(-1, batch_length)

``frames_per_batch`` vs ``batch_length``: Each collection adds frames_per_batch / num_envs time steps per env. The SliceSampler needs contiguous sequences of at least batch_length steps within a single trajectory. Ensure frames_per_batch >= batch_length * num_envs for the initial collection, or that init_random_frames >= batch_length * num_envs.
``TD_GET_DEFAULTS_TO_NONE``: Set this environment variable to 1 when running inside the Isaac container to ensure correct TensorDict default behavior.

IsaacLab Integration¶

IsaacLabWrapper¶

Collector¶

2-GPU Async Pipeline¶

RayCollector (alternative)¶

Replay Buffer¶

TorchRL-Specific Gotchas¶

Docs

Tutorials

Resources