IsaacLab Integration#
This guide covers how to use TorchRL components with IsaacLab (NVIDIA’s GPU-accelerated robotics simulation platform).
For general IsaacLab installation and cluster setup (not specific to TorchRL), see the knowledge_base/ISAACLAB.md file.
IsaacLabWrapper#
Use IsaacLabWrapper to wrap a gymnasium
IsaacLab environment into a TorchRL-compatible EnvBase:
import gymnasium as gym
from torchrl.envs.libs.isaac_lab import IsaacLabWrapper
env = gym.make("Isaac-Ant-v0", cfg=env_cfg)
env = IsaacLabWrapper(env)
Key defaults:
device=cuda:0allow_done_after_reset=True(IsaacLab can report done immediately after reset)convert_actions_to_numpy=False(actions stay as tensors)
Note
IsaacLab modifies terminated and truncated tensors in-place.
IsaacLabWrapper clones these tensors to prevent data corruption.
Note
Batched specs: IsaacLab env specs include the batch dimension (e.g., shape
(4096, obs_dim)). Use *_spec_unbatched properties when you need
per-env shapes.
Note
Reward shape: IsaacLab rewards are (num_envs,). The wrapper
unsqueezes to (num_envs, 1) for TorchRL compatibility.
Headless camera rendering#
For headless RGB capture, prefer IsaacLab tiled cameras over viewport
rendering. Add a tiled camera to the IsaacLab config before instantiating the
environment, launch IsaacLab with cameras enabled, then ask
IsaacLabWrapper to read the camera sensor into the TensorDict:
import argparse
import gymnasium as gym
import torch
from isaaclab.app import AppLauncher
from isaaclab_tasks.manager_based.classic.ant.ant_env_cfg import AntEnvCfg
from torchrl.envs.libs.isaac_lab import IsaacLabWrapper
parser = argparse.ArgumentParser()
AppLauncher.add_app_launcher_args(parser)
args, _ = parser.parse_known_args([
"--headless",
"--enable_cameras",
"--rendering_mode",
"performance",
"--device",
"cuda:0",
])
app = AppLauncher(args).app
cfg = IsaacLabWrapper.add_tiled_camera_config(
AntEnvCfg(),
width=320,
height=240,
pos=(-7.0, 0.0, 3.0),
rot=(0.9945, 0.0, 0.1045, 0.0),
)
env = gym.make("Isaac-Ant-v0", cfg=cfg)
env = IsaacLabWrapper(env, from_tiled_camera=True, device=torch.device("cuda:0"))
td = env.reset()
pixels = td["pixels"] # shape: (num_envs, height, width, 3)
The helper also exposes Isaac Lab’s renderer selection directly. For example, on Isaac Lab versions with the pluggable renderer stack installed, use the Newton Warp renderer when RTX rendering is not available:
cfg = IsaacLabWrapper.add_tiled_camera_config(
AntEnvCfg(),
renderer_backend="newton_warp",
data_type="rgb",
)
Cluster rendering dependencies#
Headless camera rendering still needs a working NVIDIA graphics stack inside the container. Minimal CUDA images often omit the EGL/GLVND and Vulkan runtime packages that Isaac Sim uses for headless cameras. On Debian/Ubuntu images, install the generic loader/runtime packages before launching Isaac Lab:
sudo apt-get update
sudo apt-get install -y libegl-dev libglvnd0 libglx0 libvulkan1 vulkan-tools
If IsaacLab reports ERROR_INCOMPATIBLE_DRIVER, cannot create a Vulkan
instance, or GPU Foundation is not initialized, verify the NVIDIA Vulkan
ICD and GL/EGL userspace before debugging TorchRL:
nvidia-smi --query-gpu=driver_version,name --format=csv,noheader | head
ldconfig -p | grep -E 'libEGL_nvidia|libnvidia-eglcore|libGLX_nvidia'
ls /usr/share/glvnd/egl_vendor.d/
ls /usr/share/vulkan/icd.d/
VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/nvidia_icd.json vulkaninfo --summary
The NVIDIA userspace libraries must match the host driver. If using distro
NVIDIA packages, install a libnvidia-gl-<driver-version> package matching
the host driver when available. Some package repositories provide a newer patch
release than the host driver; in that case, use a matching driver userspace
bundle and point the process at it:
export LD_LIBRARY_PATH=/path/to/nvidia/lib:${LD_LIBRARY_PATH}
export VK_ICD_FILENAMES=/path/to/nvidia_icd.json
export __GLX_VENDOR_LIBRARY_NAME=nvidia
export XDG_RUNTIME_DIR=/tmp/xdg-runtime-${USER}
mkdir -p "${XDG_RUNTIME_DIR}" && chmod 700 "${XDG_RUNTIME_DIR}"
For evaluator workers that should render on a dedicated physical GPU, expose
only that GPU to the worker and use cuda:0 inside the worker:
python examples/collectors/isaaclab_rnn_ppo_memory.py \
--eval \
--eval-cuda-visible-devices 2 \
--eval-worker-device cuda:0 \
--eval-nvidia-lib-dir /path/to/nvidia/lib \
--eval-vulkan-icd /path/to/nvidia_icd.json \
--eval-xdg-runtime-dir /tmp/xdg-runtime-eval
Collector#
Because IsaacLab environments are pre-vectorized (a single gym.make
creates ~4096 parallel environments on the GPU), use a single
Collector — there is no need for
ParallelEnv or MultiCollector:
from torchrl.collectors import Collector
collector = Collector(
create_env_fn=env,
policy=policy,
frames_per_batch=40960, # 10 env steps * 4096 envs
storing_device="cpu",
no_cuda_sync=True, # IMPORTANT for CUDA envs
)
no_cuda_sync=True: avoids unnecessary CUDA synchronisation that can cause hangs with GPU-native environments.storing_device="cpu": moves collected data to CPU for the replay buffer.
2-GPU Async Pipeline#
For maximum throughput, use two GPUs with a background collection thread:
GPU 0 (``sim_device``): IsaacLab simulation + collection policy inference
GPU 1 (``train_device``): Model training (world model, actor, value gradients)
import copy, threading
from tensordict import TensorDict
# Deep copy policy to sim_device for collection
collector_policy = copy.deepcopy(policy).to(sim_device)
# Background thread for continuous collection
def collect_loop(collector, replay_buffer, stop_event):
for data in collector:
replay_buffer.extend(data)
if stop_event.is_set():
break
# Main thread: train on train_device
for optim_step in range(total_steps):
batch = replay_buffer.sample()
train(batch) # all on cuda:1
# Periodic weight sync: training policy -> collector policy
if optim_step % sync_every == 0:
weights = TensorDict.from_module(policy)
collector.update_policy_weights_(weights)
Key points:
Both CUDA operations release the GIL, so they truly overlap.
Must pass
TensorDict.from_module(policy)toupdate_policy_weights_(), not the module itself.Set
CUDA_VISIBLE_DEVICES=0,1to expose 2 GPUs (IsaacLab defaults to only GPU 0).Falls back gracefully to single-GPU if only 1 GPU is available.
RayCollector (alternative)#
If you need distributed collection across multiple GPUs/nodes, use
RayCollector:
from torchrl.collectors.distributed import RayCollector
collector = RayCollector(
[make_env] * num_collectors,
policy,
frames_per_batch=8192,
collector_kwargs={
"trust_policy": True,
"no_cuda_sync": True,
},
)
Replay Buffer#
The SliceSampler needs enough sequential data. With
batch_length=50, you need at least 50 time steps per trajectory before
sampling:
init_random_frames >= batch_length * num_envs
= 50 * 4096
= 204,800
For GPU-resident replay buffers, use
LazyTensorStorage with the target CUDA device.
This avoids CPU→GPU transfer at sample time (but adds it at extend time).
TorchRL-Specific Gotchas#
``no_cuda_sync=True``: Always set this for collectors with CUDA environments. Without it, you get mysterious hangs.
Installing torchrl in Isaac container: Use
--no-build-isolation --no-depsto avoid conflicts with Isaac’s pre-installed torch/numpy.``TensorDictPrimer`` ``expand_specs``: When adding primers (e.g.,
state,belief) to a pre-vectorized env, you MUST passexpand_specs=TruetoTensorDictPrimer. Otherwise the primer shapes()conflict with the env’sbatch_size(4096,).Model-based env spec double-batching:
model_based_env.set_specs_from_env(batched_env)copies specs with batch dims baked in. The model-based env then double-batches actions during sampling (e.g.,(4096, 4096, 8)instead of(4096, 8)).Fix: unbatch the model-based env’s specs after copying:
model_based_env.set_specs_from_env(test_env) if test_env.batch_size: idx = (0,) * len(test_env.batch_size) model_based_env.__dict__["_output_spec"] = ( model_based_env.__dict__["_output_spec"][idx] ) model_based_env.__dict__["_input_spec"] = ( model_based_env.__dict__["_input_spec"][idx] ) model_based_env.empty_cache()
``torch.compile`` with TensorDict: Compiling full loss modules crashes because dynamo traces through TensorDict internals. Fix: compile individual MLP sub-modules (encoder, decoder, reward_model, value_model) with
torch._dynamo.config.suppress_errors = True. Do NOT compile RSSM (sequential, shared with collector) or loss modules (heavy TensorDict use).``SliceSampler`` with ``strict_length=False``: The sampler may return fewer elements than
batch_size. This causesreshape(-1, batch_length)to fail.Fix: truncate the sample:
sample = replay_buffer.sample() numel = sample.numel() usable = (numel // batch_length) * batch_length if usable < numel: sample = sample[:usable] sample = sample.reshape(-1, batch_length)
``frames_per_batch`` vs ``batch_length``: Each collection adds
frames_per_batch / num_envstime steps per env. TheSliceSamplerneeds contiguous sequences of at leastbatch_lengthsteps within a single trajectory. Ensureframes_per_batch >= batch_length * num_envsfor the initial collection, or thatinit_random_frames >= batch_length * num_envs.``TD_GET_DEFAULTS_TO_NONE``: Set this environment variable to
1when running inside the Isaac container to ensure correct TensorDict default behavior.