MujocoPlaygroundAgentMapping#
- torchrl.envs.MujocoPlaygroundAgentMapping(agents: list[~torchrl.envs.libs.mujoco_playground.MujocoPlaygroundAgentSpec] = <factory>, homogenization_mode: ~typing.Literal['none', 'max', 'concat'] = 'none') None[source]#
Agent mapping for
MujocoPlaygroundWrapper.Defines how to split a single-agent MuJoCo Playground environment into a cooperative multi-agent task by partitioning the observation and action vectors among named agents.
- Parameters:
agents (list of MujocoPlaygroundAgentSpec) – one entry per agent, defining each agent’s observation slice and the action indices it controls.
homogenization_mode (str, optional) –
strategy for unifying heterogeneous observation/action shapes across agents so that a single shared policy can be used.
"none"(default): each agent receives exactly its own observation/action slice; shapes may differ across agents."max": observations are padded tomax_obs_size + n_agents(a one-hot agent-ID prefix is prepended), and actions are padded tomax_action_size. All agents share the same input/output shape.Policy contract (max): each agent’s observation layout is
[one_hot_id (n_agents) | raw_obs (len(observation_indices)) | zero_pad], for a total length ofmax_obs_size + n_agents. The action vector emitted by the policy has lengthmax_action_size; only the firstlen(action_indices)entries are used and the rest are silently dropped."concat": each agent receives the full global observation/action vector with zeros at positions it does not own. All agents share the same input/output shape equal to the full environment dimensions.Policy contract (concat): the observation has length
env.observation_sizewith zeros at positions not owned by the agent. The policy must emit a full-length action vector; only entries at the agent’s ownaction_indicesare applied to the environment and entries at indices owned by other agents are ignored.
Examples
>>> mapping = MujocoPlaygroundAgentMapping( ... agents=[ ... MujocoPlaygroundAgentSpec( ... name="agent_0", ... action_indices=[0, 1, 2], ... observation_indices=[0, 1, 2, 3], ... ), ... MujocoPlaygroundAgentSpec( ... name="agent_1", ... action_indices=[3, 4, 5], ... observation_indices=[4, 5, 6, 7], ... ), ... ], ... homogenization_mode="none", ... )