VLLMWeightReceiver¶

class torchrl.weight_update.llm.VLLMWeightReceiver(scheme: VLLMWeightSyncScheme, vllm_engine)[source]¶

Receives weights in a vLLM worker using collective communication.

RPC + Collective Implementation

This class implements both layers:

RPC Layer: Currently uses Ray for coordination - init() in test uses Ray ray.get_actor() to find trainer - Fetches metadata via Ray remote call - Signals readiness to participate in collective
Collective Layer: Participates in NCCL broadcast - Receives weights via collective operations - vLLM engine applies weights internally during broadcast

Extending RPC Backends

To use a different RPC backend:

class TorchRPCVLLMReceiver(VLLMWeightReceiver):
    def init(self):
        # Custom RPC: Get metadata from trainer
        metadata = torch.distributed.rpc.rpc_sync(
            "trainer",
            lambda: get_metadata()
        )

        # Then init collective (unchanged)
        self.receiver.init_all_workers_group(metadata)

Note

The RPC and collective layers are loosely coupled. The RPC layer ensures all ranks are ready before the collective starts, but the actual data transfer is independent of the RPC mechanism.

apply_weights(weights: Any) → None[source]¶

Apply weights to vLLM engine.

Note: For vLLM, weights are applied automatically during the collective broadcast operation. This method is a no-op but kept for API consistency.

init_all_workers_group(model_metadata: dict[str, tuple[torch.dtype, torch.Size]])[source]¶

Initialize the collective communication group.

Parameters:: model_metadata – Dict mapping param names to (dtype, shape) tuples.

poll_and_apply(timeout: float = 0.1) → bool[source]¶

Poll for and apply weights.

Returns:: False - vLLM uses push-based updates via collectives, not polling.

receive(timeout: float = 0.001) → bool¶

Check for and apply new weights (non-blocking).

This method is called in the worker’s main loop to check if new weights have been sent. If weights are available, they are applied to the registered model immediately.

Parameters:: timeout – Maximum time to wait for weights (seconds). Use 0 for immediate return.
Returns:: True if weights were received and applied False if no weights were available

Note: For SharedMemWeightSyncScheme, this always returns False since workers automatically see updates via shared memory.

VLLMWeightReceiver¶

Docs

Tutorials

Resources