stateless_init_process_group¶
- class torchrl.modules.llm.stateless_init_process_group(master_address: str | None, master_port: str | None, rank, world_size, device)[source]¶
Initializes a stateless process group for distributed communication.
Creates a StatelessProcessGroup instance without relying on the global process group in torch.distributed. This approach is recommended for initializing data-plane communication (NCCL) between external processes (e.g., training processes) and vLLM workers.
- Parameters:
master_address (str | None) – The address of the master node. Defaults to “localhost” if not specified.
master_port (str | None) – The port used by the master node. Automatically assigns an open port if not specified.
rank (int) – The rank of the current process.
world_size (int) – The total number of processes in the distributed group.
device – The device to use for communication.
- Returns:
A PyNcclCommunicator instance initialized with the created StatelessProcessGroup.
- Return type:
PyNcclCommunicator