Shortcuts

SGLangCollectiveTransport

class torchrl.weight_update.llm.SGLangCollectiveTransport(server_url: str, master_address: str, master_port: int, rank: int, world_size: int, device: device | str | int | None = None, timeout: float = 300.0)[source]

Transport for SGLang using NCCL collective communication.

This transport coordinates with SGLang servers via HTTP and performs weight transfer via NCCL broadcast.

Parameters:
  • server_url – URL of the SGLang server.

  • master_address – Address for NCCL initialization.

  • master_port – Port for NCCL initialization.

  • rank – Rank of this process (0 for trainer).

  • world_size – Total number of processes.

  • device – Device to use for communication.

  • timeout – HTTP request timeout in seconds.

check_connection() bool[source]

Check if the communication group is initialized.

init_all_workers_group(model_metadata: dict[str, tuple[dtype, Size]]) None[source]

Initialize the NCCL communication group.

For the trainer (rank 0), this: 1. Signals the SGLang server via HTTP to join the NCCL group 2. Initializes the trainer’s NCCL communicator

Parameters:

model_metadata – Dict mapping param names to (dtype, shape) tuples.

send_weights(model_id: str, weights: dict[str, Tensor]) None[source]

Broadcast weights to SGLang server via NCCL.

Parameters:
  • model_id – Identifier for the model (for logging).

  • weights – Dict mapping parameter names to tensors.

Docs

Lorem ipsum dolor sit amet, consectetur

View Docs

Tutorials

Lorem ipsum dolor sit amet, consectetur

View Tutorials

Resources

Lorem ipsum dolor sit amet, consectetur

View Resources