VmasWrapper¶
- torchrl.envs.VmasWrapper(*args, **kwargs)[source]¶
Vmas environment wrapper.
GitHub: https://github.com/proroklab/VectorizedMultiAgentSimulator
Paper: https://arxiv.org/abs/2207.03530
- Parameters:
env (
vmas.simulator.environment.environment.Environment) – the vmas environment to wrap.- Keyword Arguments:
num_envs (int) – Number of vectorized simulation environments. VMAS performs vectorized simulations using PyTorch. This argument indicates the number of vectorized environments that should be simulated in a batch. It will also determine the batch size of the environment.
device (torch.device, optional) – Device for simulation. Defaults to the default device. All the tensors created by VMAS will be placed on this device.
continuous_actions (bool, optional) – Whether to use continuous actions. Defaults to
True. IfFalse, actions will be discrete. The number of actions and their size will depend on the chosen scenario. See the VMAS repository for more info.max_steps (int, optional) – Horizon of the task. Defaults to
None(infinite horizon). Each VMAS scenario can be terminating or not. Ifmax_stepsis specified, the scenario is also terminated (and the"terminated"flag is set) whenever this horizon is reached. Unlike gym’sTimeLimittransform or torchrl’sStepCounter, this argument will not set the"truncated"entry in the tensordict.categorical_actions (bool, optional) – if the environment actions are discrete, whether to transform them to categorical or one-hot. Defaults to
True.group_map (MarlGroupMapType or Dict[str, List[str]], optional) – how to group agents in tensordicts for input/output. By default, if the agent names follow the
"<name>_<int>"convention, they will be grouped by"<name>". If they do not follow this convention, they will be all put in one group named"agents". Otherwise, a group map can be specified or selected from some premade options. SeeMarlGroupMapTypefor more info.
- Variables:
group_map (Dict[str, List[str]]) – how to group agents in tensordicts for input/output. See
MarlGroupMapTypefor more info.agent_names (list of str) – names of the agent in the environment
agent_names_to_indices_map (Dict[str, int]) – dictionary mapping agent names to their index in the environment
unbatched_action_spec (TensorSpec) – version of the spec without the vectorized dimension
unbatched_observation_spec (TensorSpec) – version of the spec without the vectorized dimension
unbatched_reward_spec (TensorSpec) – version of the spec without the vectorized dimension
het_specs (bool) – whether the environment has any lazy spec
het_specs_map (Dict[str, bool]) – dictionary mapping each group to a flag representing of the group has lazy specs
available_envs (List[str]) – the list of the scenarios available to build.
Warning
VMAS returns a single
doneflag which does not distinguish between when the env reachedmax_stepsand termination. If you deem thetruncationsignal necessary, setmax_stepstoNoneand use aStepCountertransform.Examples
>>> env = VmasWrapper( ... vmas.make_env( ... scenario="flocking", ... num_envs=32, ... continuous_actions=True, ... max_steps=200, ... device="cpu", ... seed=None, ... # Scenario kwargs ... n_agents=5, ... ) ... ) >>> print(env.rollout(10)) TensorDict( fields={ agents: TensorDict( fields={ action: Tensor(shape=torch.Size([32, 10, 5, 2]), device=cpu, dtype=torch.float32, is_shared=False), info: TensorDict( fields={ agent_collision_rew: Tensor(shape=torch.Size([32, 10, 5, 1]), device=cpu, dtype=torch.float32, is_shared=False), agent_distance_rew: Tensor(shape=torch.Size([32, 10, 5, 1]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([32, 10, 5]), device=cpu, is_shared=False), observation: Tensor(shape=torch.Size([32, 10, 5, 18]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([32, 10, 5]), device=cpu, is_shared=False), done: Tensor(shape=torch.Size([32, 10, 1]), device=cpu, dtype=torch.bool, is_shared=False), next: TensorDict( fields={ agents: TensorDict( fields={ info: TensorDict( fields={ agent_collision_rew: Tensor(shape=torch.Size([32, 10, 5, 1]), device=cpu, dtype=torch.float32, is_shared=False), agent_distance_rew: Tensor(shape=torch.Size([32, 10, 5, 1]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([32, 10, 5]), device=cpu, is_shared=False), observation: Tensor(shape=torch.Size([32, 10, 5, 18]), device=cpu, dtype=torch.float32, is_shared=False), reward: Tensor(shape=torch.Size([32, 10, 5, 1]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([32, 10, 5]), device=cpu, is_shared=False), done: Tensor(shape=torch.Size([32, 10, 1]), device=cpu, dtype=torch.bool, is_shared=False), terminated: Tensor(shape=torch.Size([32, 10, 1]), device=cpu, dtype=torch.bool, is_shared=False)}, batch_size=torch.Size([32, 10]), device=cpu, is_shared=False), terminated: Tensor(shape=torch.Size([32, 10, 1]), device=cpu, dtype=torch.bool, is_shared=False)}, batch_size=torch.Size([32, 10]), device=cpu, is_shared=False)