torch.cuda¶
This package adds support for CUDA tensor types, that implement the same function as CPU tensors, but they utilize GPUs for computation.
It is lazily initialized, so you can always import it, and use
is_available()
to determine if your system supports CUDA.
CUDA semantics has more details about working with CUDA.
-
class
torch.cuda.
device
(idx)[source]¶ Context-manager that changes the selected device.
Parameters: idx (int) – device index to select. It’s a no-op if this argument is negative.
-
class
torch.cuda.
device_of
(obj)[source]¶ Context-manager that changes the current device to that of given object.
You can use both tensors and storages as arguments. If a given object is not allocated on a GPU, this is a no-op.
Parameters: obj (Tensor or Storage) – object allocated on the selected device.
-
torch.cuda.
set_device
(device)[source]¶ Sets the current device.
Usage of this function is discouraged in favor of
device
. In most cases it’s better to useCUDA_VISIBLE_DEVICES
environmental variable.Parameters: device (int) – selected device. This function is a no-op if this argument is negative.
-
torch.cuda.
stream
(stream)[source]¶ Context-manager that selects a given stream.
All CUDA kernels queued within its context will be enqueued on a selected stream.
Parameters: stream (Stream) – selected stream. This manager is a no-op if it’s None
.
-
torch.cuda.
synchronize
()[source]¶ Waits for all kernels in all streams on current device to complete.
Communication collectives¶
-
torch.cuda.comm.
broadcast
(tensor, devices)[source]¶ Broadcasts a tensor to a number of GPUs.
Parameters: - tensor (Tensor) – tensor to broadcast.
- devices (Iterable) – an iterable of devices among which to broadcast. Note that it should be like (src, dst1, dst2, ...), the first element of which is the source device to broadcast from.
Returns: A tuple containing copies of the
tensor
, placed on devices corresponding to indices fromdevices
.
-
torch.cuda.comm.
reduce_add
(inputs, destination=None)[source]¶ Sums tensors from multiple GPUs.
All inputs should have matching shapes.
Parameters: Returns: A tensor containing an elementwise sum of all inputs, placed on the
destination
device.
-
torch.cuda.comm.
scatter
(tensor, devices, chunk_sizes=None, dim=0, streams=None)[source]¶ Scatters tensor across multiple GPUs.
Parameters: - tensor (Tensor) – tensor to scatter.
- devices (Iterable[int]) – iterable of ints, specifying among which devices the tensor should be scattered.
- chunk_sizes (Iterable[int], optional) – sizes of chunks to be placed on
each device. It should match
devices
in length and sum totensor.size(dim)
. If not specified, the tensor will be divided into equal chunks. - dim (int, optional) – A dimension along which to chunk the tensor.
Returns: A tuple containing chunks of the
tensor
, spread accross givendevices
.
Streams and events¶
-
class
torch.cuda.
Stream
[source]¶ Wrapper around a CUDA stream.
Parameters: -
query
()[source]¶ Checks if all the work submitted has been completed.
Returns: A boolean indicating if all kernels in this stream are completed.
-
record_event
(event=None)[source]¶ Records an event.
Parameters: event (Event, optional) – event to record. If not given, a new one will be allocated. Returns: Recorded event.
-
NVIDIA Tools Extension (NVTX)¶
-
torch.cuda.nvtx.
mark
(msg)[source]¶ Describe an instantaneous event that occurred at some point.
Parameters: msg (string) – ASCII message to associate with the event.