LiberoWrapper#
- torchrl.envs.LiberoWrapper(*args, **kwargs)[source]#
LIBERO environment wrapper.
GitHub: Lifelong-Robot-Learning/LIBERO
Paper: https://arxiv.org/abs/2306.03310 (LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning, Liu et al., 2023)
LIBERO is a benchmark of language-conditioned tabletop manipulation tasks (robosuite/MuJoCo based) widely used to evaluate and fine-tune Vision-Language-Action policies (e.g. OpenVLA, SimpleVLA-RL). This wrapper exposes a LIBERO environment with the canonical VLA TensorDict schema (see the VLA reference page): a
uint8[C, H, W]cameraimage(and optionally awrist_image) plus a proprioceptivestateunderobservation, thelanguage_instructionstring at the root, and a booleansuccessentry. Episodes terminate on success and truncate aftermax_episode_steps; the reward is the binary success indicator.Initial-state control is first-class: LIBERO ships a fixed set of initial simulator states per task (50 for the standard evaluation protocol), and the rollout-grouping mechanism GRPO-style group advantages require (n rollouts per initial state) is built in via
group_repeats(seeMCAdvantage).Note
Rendering is offscreen (no display needed). On headless Linux set
MUJOCO_GL=egl(orosmesa); on macOS the default (cgl) works. One environment instance hosts one MuJoCo simulation: useParallelEnvor anAsyncEnvPoolwith one instance per process for parallel collection. With EGL,LiberoEnvaccepts arender_gpu_device_idargument to pin robosuite rendering to an EGL-visible GPU device.Note
Importing
liberofor the first time normally prompts on stdin for a dataset path. This wrapper pre-seeds the default LIBERO config (honoringLIBERO_CONFIG_PATH) so that headless workers never block on the prompt.- Parameters:
env (
libero.libero.envs.OffScreenRenderEnv) – the environment to wrap.- Keyword Arguments:
instruction (str, optional) – the language instruction. Defaults to the instruction parsed from the task’s BDDL file.
init_states (np.ndarray, optional) – a
[N, D]array of initial simulator states to reset to (as returned by the LIBERO benchmark for each task). Required forinit_state_mode/group_repeats. Defaults toNone(native random resets).init_state_mode (str, optional) – how the initial state is selected at reset when
init_statesis provided. One of"random"(sample uniformly),"cycle"(walk through the init states in order – the fixed-50-trials evaluation protocol) or"fixed"(alwaysinit_state_id). Defaults to"random".init_state_id (int, optional) – the initial state used with
init_state_mode="fixed". Defaults to0.group_repeats (int, optional) – grouped-rollout mode: the same initial state is replayed for
group_repeatsconsecutive episodes (one GRPO group) before the next one is selected, and an integergroup_idobservation entry identifies the group. Defaults toNone.group_id_offset (int, optional) – added to the group ids. Group ids must be unique across concurrently collecting workers for group advantages to be computed correctly: give each worker a disjoint offset (e.g.
worker_idx * 10**6). Defaults to0.group_id_mode (str, optional) – how grouped-rollout ids are stamped.
"episode"uses the local group counter (episode_count // group_repeats)."init_state"uses the selected initial-state id and requiresinit_state_modeto be"cycle"or"fixed". The latter is useful when several parallel workers share the same task and should form GRPO groups by initial state even if their episode lengths differ. Defaults to"episode".camera (str, optional) – the camera rendered under
("observation", "image"). Defaults to"agentview".wrist_camera (str, optional) – if provided (e.g.
"robot0_eye_in_hand"), this camera is exposed under("observation", "wrist_image"). Defaults toNone.from_pixels (bool, optional) – if
True, expose the renderedcameraframe as a rootpixelsentry too (HWCuint8), in addition to the CHW("observation", "image")the policy consumes. This is the torchrl pixels-rendering convention and feedsVideoRecorder(defaultin_keys=["pixels"]) with no extra render cost. Defaults toFalse.flip_image (bool, optional) – rotate the rendered images by 180 degrees. LIBERO renders upside down relative to the standard human-view orientation used by the OpenVLA / SimpleVLA-RL pipelines. Defaults to
True.proprio_keys (tuple of str, optional) – the low-dimensional observation entries concatenated (in order) into
("observation", "state"). Defaults to("robot0_eef_pos", "robot0_eef_quat", "robot0_gripper_qpos").max_episode_steps (int, optional) – truncation horizon, in env steps. Defaults to
512(the SimpleVLA-RL setting).settle_steps (int, optional) – no-op steps (zero delta, gripper open) executed after setting the initial state so the simulation settles, as in the OpenVLA evaluation protocol. Defaults to
10.device (torch.device, optional) – device of the output tensordicts.
- Variables:
available_envs (list of str) – the available task-suite names.
Examples
>>> import os >>> from libero.libero import benchmark, get_libero_path >>> from libero.libero.envs import OffScreenRenderEnv >>> from torchrl.envs import LiberoWrapper >>> suite = benchmark.get_benchmark_dict()["libero_spatial"]() >>> task = suite.get_task(0) >>> env = OffScreenRenderEnv( ... bddl_file_name=os.path.join( ... get_libero_path("bddl_files"), task.problem_folder, task.bddl_file ... ), ... camera_heights=256, ... camera_widths=256, ... ) >>> env = LiberoWrapper(env, instruction=task.language) >>> td = env.reset() >>> td["observation", "image"].shape torch.Size([3, 256, 256])