Shortcuts

stateless_init_process_group

class torchrl.modules.llm.stateless_init_process_group(master_address: str | None, master_port: str | None, rank, world_size, device)[source]

Initializes a stateless process group for distributed communication.

Creates a StatelessProcessGroup instance without relying on the global process group in torch.distributed. This approach is recommended for initializing data-plane communication (NCCL) between external processes (e.g., training processes) and vLLM workers.

Parameters:
  • master_address (str | None) – The address of the master node. Defaults to “localhost” if not specified.

  • master_port (str | None) – The port used by the master node. Automatically assigns an open port if not specified.

  • rank (int) – The rank of the current process.

  • world_size (int) – The total number of processes in the distributed group.

  • device – The device to use for communication.

Returns:

A PyNcclCommunicator instance initialized with the created StatelessProcessGroup.

Return type:

PyNcclCommunicator

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources