Rate this Page

★ ★ ★ ★ ★

Experimental Object Oriented Distributed API#

Created On: Jul 09, 2025 | Last Updated On: Jul 30, 2025

This is an experimental new API for PyTorch Distributed. This is actively in development and subject to change or deletion entirely.

This is intended as a proving ground for more flexible and object oriented distributed APIs.

class torch.distributed._dist2.ProcessGroup#

Bases: pybind11_object

A ProcessGroup is a communication primitive that allows for collective operations across a group of processes.

This is a base class that provides the interface for all ProcessGroups. It is not meant to be used directly, but rather extended by subclasses.

class BackendType#

Bases: pybind11_object

The type of the backend used for the process group.

Members:

UNDEFINED

GLOO

NCCL

XCCL

UCC

MPI

CUSTOM

CUSTOM = <BackendType.CUSTOM: 6>#

GLOO = <BackendType.GLOO: 1>#

MPI = <BackendType.MPI: 4>#

NCCL = <BackendType.NCCL: 2>#

UCC = <BackendType.UCC: 3>#

UNDEFINED = <BackendType.UNDEFINED: 0>#

XCCL = <BackendType.XCCL: 5>#

property name#

property value#

CUSTOM = <BackendType.CUSTOM: 6>#

GLOO = <BackendType.GLOO: 1>#

MPI = <BackendType.MPI: 4>#

NCCL = <BackendType.NCCL: 2>#

UCC = <BackendType.UCC: 3>#

UNDEFINED = <BackendType.UNDEFINED: 0>#

XCCL = <BackendType.XCCL: 5>#

abort(self: torch._C._distributed_c10d.ProcessGroup) → None#: abort all operations and connections if supported by the backend

allgather(*args, **kwargs)#

Overloaded function.

allgather(self: torch._C._distributed_c10d.ProcessGroup, output_tensors: collections.abc.Sequence[collections.abc.Sequence[torch.Tensor]], input_tensors: collections.abc.Sequence[torch.Tensor], opts: torch._C._distributed_c10d.AllgatherOptions = <torch._C._distributed_c10d.AllgatherOptions object at 0x7fb15a4999b0>) -> c10d::Work

Allgathers the input tensors from all processes across the process group.

See torch.distributed.all_gather() for more details.

allgather(self: torch._C._distributed_c10d.ProcessGroup, output_tensors: collections.abc.Sequence[torch.Tensor], input_tensor: torch.Tensor, timeout: datetime.timedelta | None = None) -> c10d::Work

Allgathers the input tensors from all processes across the process group.

See torch.distributed.all_gather() for more details.

allgather_coalesced(self: torch._C._distributed_c10d.ProcessGroup, output_lists: collections.abc.Sequence[collections.abc.Sequence[torch.Tensor]], input_list: collections.abc.Sequence[torch.Tensor], opts: torch._C._distributed_c10d.AllgatherOptions = <torch._C._distributed_c10d.AllgatherOptions object at 0x7fb159e384b0>) → c10d::Work#: Allgathers the input tensors from all processes across the process group.

See torch.distributed.all_gather() for more details.

allgather_into_tensor_coalesced(self: torch._C._distributed_c10d.ProcessGroup, outputs: collections.abc.Sequence[torch.Tensor], inputs: collections.abc.Sequence[torch.Tensor], opts: torch._C._distributed_c10d.AllgatherOptions = <torch._C._distributed_c10d.AllgatherOptions object at 0x7fb15a4c85b0>) → c10d::Work#: Allgathers the input tensors from all processes across the process group.

See torch.distributed.all_gather() for more details.

allreduce(*args, **kwargs)#

Overloaded function.

allreduce(self: torch._C._distributed_c10d.ProcessGroup, tensors: collections.abc.Sequence[torch.Tensor], opts: torch._C._distributed_c10d.AllreduceOptions = <torch._C._distributed_c10d.AllreduceOptions object at 0x7fb15a48bbb0>) -> c10d::Work

Allreduces the provided tensors across all processes in the process group.

See torch.distributed.all_reduce() for more details.

allreduce(self: torch._C._distributed_c10d.ProcessGroup, tensors: collections.abc.Sequence[torch.Tensor], op: torch._C._distributed_c10d.ReduceOp = <RedOpType.SUM: 0>, timeout: datetime.timedelta | None = None) -> c10d::Work

Allreduces the provided tensors across all processes in the process group.

See torch.distributed.all_reduce() for more details.

allreduce(self: torch._C._distributed_c10d.ProcessGroup, tensor: torch.Tensor, op: torch._C._distributed_c10d.ReduceOp = <RedOpType.SUM: 0>, timeout: datetime.timedelta | None = None) -> c10d::Work

Allreduces the provided tensors across all processes in the process group.

See torch.distributed.all_reduce() for more details.

allreduce_coalesced(self: torch._C._distributed_c10d.ProcessGroup, tensors: collections.abc.Sequence[torch.Tensor], opts: torch._C._distributed_c10d.AllreduceCoalescedOptions = <torch._C._distributed_c10d.AllreduceCoalescedOptions object at 0x7fb15a4899f0>) → c10d::Work#: Allreduces the provided tensors across all processes in the process group.

See torch.distributed.all_reduce() for more details.

alltoall(self: torch._C._distributed_c10d.ProcessGroup, output_tensors: collections.abc.Sequence[torch.Tensor], input_tensors: collections.abc.Sequence[torch.Tensor], opts: torch._C._distributed_c10d.AllToAllOptions = <torch._C._distributed_c10d.AllToAllOptions object at 0x7fb159e20170>) → c10d::Work#: Alltoalls the input tensors from all processes across the process group.

See torch.distributed.all_to_all() for more details.

alltoall_base(*args, **kwargs)#

Overloaded function.

alltoall_base(self: torch._C._distributed_c10d.ProcessGroup, output: torch.Tensor, input: torch.Tensor, output_split_sizes: collections.abc.Sequence[typing.SupportsInt], input_split_sizes: collections.abc.Sequence[typing.SupportsInt], opts: torch._C._distributed_c10d.AllToAllOptions = <torch._C._distributed_c10d.AllToAllOptions object at 0x7fb15a4c6ab0>) -> c10d::Work

Alltoalls the input tensors from all processes across the process group.

See torch.distributed.all_to_all() for more details.

alltoall_base(self: torch._C._distributed_c10d.ProcessGroup, output: torch.Tensor, input: torch.Tensor, output_split_sizes: collections.abc.Sequence[typing.SupportsInt], input_split_sizes: collections.abc.Sequence[typing.SupportsInt], timeout: datetime.timedelta | None = None) -> c10d::Work

Alltoalls the input tensors from all processes across the process group.

See torch.distributed.all_to_all() for more details.

barrier(*args, **kwargs)#

Overloaded function.

barrier(self: torch._C._distributed_c10d.ProcessGroup, opts: torch._C._distributed_c10d.BarrierOptions = <torch._C._distributed_c10d.BarrierOptions object at 0x7fb159e38df0>) -> c10d::Work

Blocks until all processes in the group enter the call, and

then all leave the call together.

See torch.distributed.barrier() for more details.

barrier(self: torch._C._distributed_c10d.ProcessGroup, timeout: datetime.timedelta | None = None) -> c10d::Work

Blocks until all processes in the group enter the call, and

then all leave the call together.

See torch.distributed.barrier() for more details.

property bound_device_id#

boxed(self: torch._C._distributed_c10d.ProcessGroup) → object#

broadcast(*args, **kwargs)#

Overloaded function.

broadcast(self: torch._C._distributed_c10d.ProcessGroup, tensors: collections.abc.Sequence[torch.Tensor], opts: torch._C._distributed_c10d.BroadcastOptions = <torch._C._distributed_c10d.BroadcastOptions object at 0x7fb15a2084b0>) -> c10d::Work

Broadcasts the tensor to all processes in the process group.

See torch.distributed.broadcast() for more details.

broadcast(self: torch._C._distributed_c10d.ProcessGroup, tensor: torch.Tensor, root: typing.SupportsInt, timeout: datetime.timedelta | None = None) -> c10d::Work

Broadcasts the tensor to all processes in the process group.

See torch.distributed.broadcast() for more details.

gather(*args, **kwargs)#

Overloaded function.

gather(self: torch._C._distributed_c10d.ProcessGroup, output_tensors: collections.abc.Sequence[collections.abc.Sequence[torch.Tensor]], input_tensors: collections.abc.Sequence[torch.Tensor], opts: torch._C._distributed_c10d.GatherOptions = <torch._C._distributed_c10d.GatherOptions object at 0x7fb15a4bd430>) -> c10d::Work

Gathers the input tensors from all processes across the process group.

See torch.distributed.gather() for more details.

gather(self: torch._C._distributed_c10d.ProcessGroup, output_tensors: collections.abc.Sequence[torch.Tensor], input_tensor: torch.Tensor, root: typing.SupportsInt, timeout: datetime.timedelta | None = None) -> c10d::Work

Gathers the input tensors from all processes across the process group.

See torch.distributed.gather() for more details.

get_group_store(self: torch._C._distributed_c10d.ProcessGroup) → torch._C._distributed_c10d.Store#: Get the store of this process group.

property group_desc#: Gets this process group description

property group_name#: (Gets this process group name. It’s cluster unique)

merge_remote_group(self: torch._C._distributed_c10d.ProcessGroup, store: torch._C._distributed_c10d.Store, size: SupportsInt, timeout: datetime.timedelta = datetime.timedelta(seconds=1800), group_name: str | None = None, group_desc: str | None = None) → torch._C._distributed_c10d.ProcessGroup#

monitored_barrier(self: torch._C._distributed_c10d.ProcessGroup, timeout: datetime.timedelta | None = None, wait_all_ranks: bool = False) → None#

Blocks until all processes in the group enter the call, and

then all leave the call together.

See torch.distributed.monitored_barrier() for more details.

name(self: torch._C._distributed_c10d.ProcessGroup) → str#: Get the name of this process group.

rank(self: torch._C._distributed_c10d.ProcessGroup) → int#: Get the rank of this process group.

recv(self: torch._C._distributed_c10d.ProcessGroup, tensors: collections.abc.Sequence[torch.Tensor], srcRank: SupportsInt, tag: SupportsInt) → c10d::Work#: Receives the tensor from the specified rank.

See torch.distributed.recv() for more details.

recv_anysource(self: torch._C._distributed_c10d.ProcessGroup, arg0: collections.abc.Sequence[torch.Tensor], arg1: SupportsInt) → c10d::Work#: Receives the tensor from any source.

See torch.distributed.recv() for more details.

reduce(*args, **kwargs)#

Overloaded function.

reduce(self: torch._C._distributed_c10d.ProcessGroup, tensors: collections.abc.Sequence[torch.Tensor], opts: torch._C._distributed_c10d.ReduceOptions = <torch._C._distributed_c10d.ReduceOptions object at 0x7fb15a1775f0>) -> c10d::Work

Reduces the provided tensors across all processes in the process group.

See torch.distributed.reduce() for more details.

reduce(self: torch._C._distributed_c10d.ProcessGroup, tensor: torch.Tensor, root: typing.SupportsInt, op: torch._C._distributed_c10d.ReduceOp = <RedOpType.SUM: 0>, timeout: datetime.timedelta | None = None) -> c10d::Work

Reduces the provided tensors across all processes in the process group.

See torch.distributed.reduce() for more details.

reduce_scatter(*args, **kwargs)#

Overloaded function.

reduce_scatter(self: torch._C._distributed_c10d.ProcessGroup, output_tensors: collections.abc.Sequence[torch.Tensor], input_tensors: collections.abc.Sequence[collections.abc.Sequence[torch.Tensor]], opts: torch._C._distributed_c10d.ReduceScatterOptions = <torch._C._distributed_c10d.ReduceScatterOptions object at 0x7fb15a174c30>) -> c10d::Work

Reduces and scatters the input tensors from all processes across the process group.

See torch.distributed.reduce_scatter() for more details.

reduce_scatter(self: torch._C._distributed_c10d.ProcessGroup, output: torch.Tensor, input: collections.abc.Sequence[torch.Tensor], op: torch._C._distributed_c10d.ReduceOp = <RedOpType.SUM: 0>, timeout: datetime.timedelta | None = None) -> c10d::Work

Reduces and scatters the input tensors from all processes across the process group.

See torch.distributed.reduce_scatter() for more details.

reduce_scatter_tensor_coalesced(self: torch._C._distributed_c10d.ProcessGroup, outputs: collections.abc.Sequence[torch.Tensor], inputs: collections.abc.Sequence[torch.Tensor], opts: torch._C._distributed_c10d.ReduceScatterOptions = <torch._C._distributed_c10d.ReduceScatterOptions object at 0x7fb159e38bb0>) → c10d::Work#: Reduces and scatters the input tensors from all processes across the process group.

See torch.distributed.reduce_scatter() for more details.

scatter(*args, **kwargs)#

Overloaded function.

scatter(self: torch._C._distributed_c10d.ProcessGroup, output_tensors: collections.abc.Sequence[torch.Tensor], input_tensors: collections.abc.Sequence[collections.abc.Sequence[torch.Tensor]], opts: torch._C._distributed_c10d.ScatterOptions = <torch._C._distributed_c10d.ScatterOptions object at 0x7fb15a4a3630>) -> c10d::Work

Scatters the input tensors from all processes across the process group.

See torch.distributed.scatter() for more details.

scatter(self: torch._C._distributed_c10d.ProcessGroup, output_tensor: torch.Tensor, input_tensors: collections.abc.Sequence[torch.Tensor], root: typing.SupportsInt, timeout: datetime.timedelta | None = None) -> c10d::Work

Scatters the input tensors from all processes across the process group.

See torch.distributed.scatter() for more details.

send(self: torch._C._distributed_c10d.ProcessGroup, tensors: collections.abc.Sequence[torch.Tensor], dstRank: SupportsInt, tag: SupportsInt) → c10d::Work#: Sends the tensor to the specified rank.

See torch.distributed.send() for more details.

set_timeout(self: torch._C._distributed_c10d.ProcessGroup, timeout: datetime.timedelta) → None#: Sets the default timeout for all future operations.

shutdown(self: torch._C._distributed_c10d.ProcessGroup) → None#: shutdown the process group

size(self: torch._C._distributed_c10d.ProcessGroup) → int#: Get the size of this process group.

split_group(self: torch._C._distributed_c10d.ProcessGroup, ranks: collections.abc.Sequence[typing.SupportsInt], timeout: datetime.timedelta | None = None, opts: c10d::Backend::Options | None = None, group_name: str | None = None, group_desc: str | None = None) → torch._C._distributed_c10d.ProcessGroup#

static unbox(arg0: object) → torch._C._distributed_c10d.ProcessGroup#

class torch.distributed._dist2.ProcessGroupFactory(*args, **kwargs)[source]#

Bases: Protocol

Protocol for process group factories.

torch.distributed._dist2.current_process_group()[source]#

Get the current process group. Thread local method.

Returns: The current process group.
Return type: ProcessGroup

torch.distributed._dist2.new_group(backend, timeout, device, **kwargs)[source]#

Create a new process group with the given backend and options. This group is independent and will not be globally registered and thus not usable via the standard torch.distributed.* APIs.

Parameters

backend (str) – The backend to use for the process group.
timeout (timedelta) – The timeout for collective operations.
device (Union[str, device]) – The device to use for the process group.
**kwargs (object) – All remaining arguments are passed to the backend constructor. See the backend specific documentation for details.

Returns

A new process group.

Return type

ProcessGroup

torch.distributed._dist2.process_group(pg)[source]#

Context manager for process groups. Thread local method.

Parameters: pg (ProcessGroup) – The process group to use.
Return type: Generator[None, None, None]

torch.distributed._dist2.register_backend(name, func)[source]#

Parameters

name (str) – The name of the backend.
func (ProcessGroupFactory) – The function to create the process group.

Experimental Object Oriented Distributed API#

Docs

Tutorials

Resources