DistributedWeightSyncScheme¶

class torchrl.weight_update.DistributedWeightSyncScheme(backend: str = 'gloo', sync: bool = True)[source]¶

Weight synchronization for torch.distributed.

This scheme uses torch.distributed primitives (send/recv) to synchronize weights across distributed workers. Each worker gets its own transport, following the same pattern as multiprocess collectors.

Parameters:

backend (str) – The distributed backend (“gloo”, “nccl”, etc.)
sync (bool) – Whether to use synchronous weight updates

create_receiver() → WeightReceiver¶

Create a receiver for this scheme (legacy).

Returns:: WeightReceiver instance configured for this scheme.

create_sender() → WeightSender¶

Create a sender for this scheme (legacy).

Returns:: WeightSender instance configured for this scheme.

create_transport(pipe_or_context: Any) → TransportBackend[source]¶

Create distributed transport for a specific worker.

Parameters:: pipe_or_context – A tuple of (store, rank) for the worker.
Returns:: DistributedTransport configured for this specific worker.

get_receiver() → WeightReceiver¶

Get the receiver instance.

Returns:: Receiver instance for receiving weights in this worker
Raises:: RuntimeError – If init_on_worker() hasn’t been called yet

get_sender() → WeightSender¶

Get the sender instance.

Returns:: Sender instance for sending weights to workers
Raises:: RuntimeError – If init_on_sender() hasn’t been called yet

init_on_sender(model_id: str, context: Any = None, **kwargs) → None¶

Initialize on the main process (sender side).

This method is called once in the collector’s _run_processes() method, after workers have been started and are ready to receive messages.

Parameters:

model_id – Identifier for the model being synchronized
context – Optional context object (e.g., collector) providing: - .pipes: list[mp.Connection] - .get_model(model_id: str) -> nn.Module - .get_cached_weights(model_id: str) -> TensorDict | None - .num_workers: int
**kwargs – Alternative to context (pipes, num_workers, model, cached_weights, etc.)

init_on_worker(model_id: str, context: Any = None, **kwargs) → None¶

Initialize on worker process (receiver side).

This method is called once in each worker’s initialization.

Parameters:

model_id – Identifier for the model being synchronized
context – Optional context object (e.g., inner collector) providing: - .pipe: mp.Connection - .get_model(model_id: str) -> nn.Module
**kwargs – Alternative to context (pipe, model, etc.)

prepare_weights(weights: Any, model_id: str, strategy: WeightStrategy, context: Any = None) → Any¶

Prepare weights for sending.

This method handles weight extraction, conversion, and any scheme-specific preparation (e.g., cache lookups for SharedMemWeightSyncScheme).

Parameters:

weights – Raw weights input (can be None, nn.Module, TensorDict, dict, str reference, etc.)
model_id – The model identifier (e.g., “policy”)
strategy – WeightStrategy for extracting/converting weights
context – Optional context (e.g., collector) for model resolution

Returns:

Prepared weights ready to send via transport

DistributedWeightSyncScheme¶

Docs

Tutorials

Resources