IsaacLab Integration¶
This guide covers how to use TorchRL components with IsaacLab (NVIDIA’s GPU-accelerated robotics simulation platform).
For general IsaacLab installation and cluster setup (not specific to TorchRL), see the knowledge_base/ISAACLAB.md file.
IsaacLabWrapper¶
Use IsaacLabWrapper to wrap a gymnasium
IsaacLab environment into a TorchRL-compatible EnvBase:
import gymnasium as gym
from torchrl.envs.libs.isaac_lab import IsaacLabWrapper
env = gym.make("Isaac-Ant-v0", cfg=env_cfg)
env = IsaacLabWrapper(env)
Key defaults:
device=cuda:0allow_done_after_reset=True(IsaacLab can report done immediately after reset)convert_actions_to_numpy=False(actions stay as tensors)
Note
IsaacLab modifies terminated and truncated tensors in-place.
IsaacLabWrapper clones these tensors to prevent data corruption.
Note
Batched specs: IsaacLab env specs include the batch dimension (e.g., shape
(4096, obs_dim)). Use *_spec_unbatched properties when you need
per-env shapes.
Note
Reward shape: IsaacLab rewards are (num_envs,). The wrapper
unsqueezes to (num_envs, 1) for TorchRL compatibility.
Collector¶
Because IsaacLab environments are pre-vectorized (a single gym.make
creates ~4096 parallel environments on the GPU), use a single
Collector — there is no need for
ParallelEnv or MultiCollector:
from torchrl.collectors import Collector
collector = Collector(
create_env_fn=env,
policy=policy,
frames_per_batch=40960, # 10 env steps * 4096 envs
storing_device="cpu",
no_cuda_sync=True, # IMPORTANT for CUDA envs
)
no_cuda_sync=True: avoids unnecessary CUDA synchronisation that can cause hangs with GPU-native environments.storing_device="cpu": moves collected data to CPU for the replay buffer.
2-GPU Async Pipeline¶
For maximum throughput, use two GPUs with a background collection thread:
GPU 0 (``sim_device``): IsaacLab simulation + collection policy inference
GPU 1 (``train_device``): Model training (world model, actor, value gradients)
import copy, threading
from tensordict import TensorDict
# Deep copy policy to sim_device for collection
collector_policy = copy.deepcopy(policy).to(sim_device)
# Background thread for continuous collection
def collect_loop(collector, replay_buffer, stop_event):
for data in collector:
replay_buffer.extend(data)
if stop_event.is_set():
break
# Main thread: train on train_device
for optim_step in range(total_steps):
batch = replay_buffer.sample()
train(batch) # all on cuda:1
# Periodic weight sync: training policy -> collector policy
if optim_step % sync_every == 0:
weights = TensorDict.from_module(policy)
collector.update_policy_weights_(weights)
Key points:
Both CUDA operations release the GIL, so they truly overlap.
Must pass
TensorDict.from_module(policy)toupdate_policy_weights_(), not the module itself.Set
CUDA_VISIBLE_DEVICES=0,1to expose 2 GPUs (IsaacLab defaults to only GPU 0).Falls back gracefully to single-GPU if only 1 GPU is available.
RayCollector (alternative)¶
If you need distributed collection across multiple GPUs/nodes, use
RayCollector:
from torchrl.collectors.distributed import RayCollector
collector = RayCollector(
[make_env] * num_collectors,
policy,
frames_per_batch=8192,
collector_kwargs={
"trust_policy": True,
"no_cuda_sync": True,
},
)
Replay Buffer¶
The SliceSampler needs enough sequential data. With
batch_length=50, you need at least 50 time steps per trajectory before
sampling:
init_random_frames >= batch_length * num_envs
= 50 * 4096
= 204,800
For GPU-resident replay buffers, use
LazyTensorStorage with the target CUDA device.
This avoids CPU→GPU transfer at sample time (but adds it at extend time).
TorchRL-Specific Gotchas¶
``no_cuda_sync=True``: Always set this for collectors with CUDA environments. Without it, you get mysterious hangs.
Installing torchrl in Isaac container: Use
--no-build-isolation --no-depsto avoid conflicts with Isaac’s pre-installed torch/numpy.``TensorDictPrimer`` ``expand_specs``: When adding primers (e.g.,
state,belief) to a pre-vectorized env, you MUST passexpand_specs=TruetoTensorDictPrimer. Otherwise the primer shapes()conflict with the env’sbatch_size(4096,).Model-based env spec double-batching:
model_based_env.set_specs_from_env(batched_env)copies specs with batch dims baked in. The model-based env then double-batches actions during sampling (e.g.,(4096, 4096, 8)instead of(4096, 8)).Fix: unbatch the model-based env’s specs after copying:
model_based_env.set_specs_from_env(test_env) if test_env.batch_size: idx = (0,) * len(test_env.batch_size) model_based_env.__dict__["_output_spec"] = ( model_based_env.__dict__["_output_spec"][idx] ) model_based_env.__dict__["_input_spec"] = ( model_based_env.__dict__["_input_spec"][idx] ) model_based_env.empty_cache()
``torch.compile`` with TensorDict: Compiling full loss modules crashes because dynamo traces through TensorDict internals. Fix: compile individual MLP sub-modules (encoder, decoder, reward_model, value_model) with
torch._dynamo.config.suppress_errors = True. Do NOT compile RSSM (sequential, shared with collector) or loss modules (heavy TensorDict use).``SliceSampler`` with ``strict_length=False``: The sampler may return fewer elements than
batch_size. This causesreshape(-1, batch_length)to fail.Fix: truncate the sample:
sample = replay_buffer.sample() numel = sample.numel() usable = (numel // batch_length) * batch_length if usable < numel: sample = sample[:usable] sample = sample.reshape(-1, batch_length)
``frames_per_batch`` vs ``batch_length``: Each collection adds
frames_per_batch / num_envstime steps per env. TheSliceSamplerneeds contiguous sequences of at leastbatch_lengthsteps within a single trajectory. Ensureframes_per_batch >= batch_length * num_envsfor the initial collection, or thatinit_random_frames >= batch_length * num_envs.``TD_GET_DEFAULTS_TO_NONE``: Set this environment variable to
1when running inside the Isaac container to ensure correct TensorDict default behavior.