Device Management#

Created On: Nov 14, 2025 | Last Updated On: Dec 09, 2025

Background#

Device management covers basics such as querying how many devices are available and switching between them. Accelerator backends wrap their device‑runtime APIs and expose them to PyTorch.

Design#

Accelerator vendors should implement these core functions:

Function name	Description	Application scenarios
`device_count()`	Query the total number of available devices in the system	- Application initialization - Multi-device workload distribution - Validating device indices before use
`current_device()`	Get the currently active device for the calling thread	- Debugging and logging - Determining tensor placement - Guard implementations
`set_device()`	Change the active device for subsequent operations	- Switching context between devices - Initializing specific device resources - Multi-GPU training loops
`exchange_device()`	Atomically swap device and return the previous device	- Implementing device guards - Temporarily switching device context - RAII-based device management
`maybe_exchange_device()`	Conditionally exchange device only if the index is valid (−1 allowed)	- Safe device switching with optional indices - Guard implementations with nullable device values

These functions are the building blocks for streams, events, and memory management. Validate inputs and handle errors properly.

Implementation#

This section illustrates device management using set_device as an example. The implementation requires:

C++ wrappers around the device runtime
Python bindings to expose the C++ functions
User-friendly Python APIs

For illustration, OpenReg (Open Registration) is a PyTorch integration example that fills the gap for out‑of‑tree accelerator backend integration. Its implementation (OpenRegFunctions.h/cpp) demonstrates how to wrap a third‑party runtime cleanly. These functions are reused across the backend—for streams, events, generators, and Python bindings.

C++ side#

Wrap the device‑runtime API and add error handling. The SetDevice function shows this pattern:

orError_t SetDevice(DeviceIndex device) {
  int cur_device = -1;
  OPENREG_CHECK(orGetDevice(&cur_device));
  if (device == cur_device) {
    return orSuccess;
  }
  return orSetDevice(device);
}

OPENREG_EXPORT void set_device(DeviceIndex device) {
  check_device_index(device);
  OPENREG_CHECK(SetDevice(device));
}

Bindings#

Expose the C++ functions to Python using pybind11:

PyObject* _setDevice(PyObject* self, PyObject* arg) {
  HANDLE_TH_ERRORS
  TORCH_CHECK(THPUtils_checkLong(arg), "invalid argument to setDevice");
  auto device = THPUtils_unpackDeviceIndex(arg);
  torch::utils::device_lazy_init(at::kPrivateUse1);
  c10::openreg::set_device(device);

  Py_RETURN_NONE;
  END_HANDLE_TH_ERRORS
}

static PyMethodDef methods[] = {
    {"_init", _initExtension, METH_NOARGS, nullptr},
    {"_isInBadFork", _isInBadFork, METH_NOARGS, nullptr},
    {"_get_default_generator", _getDefaultGenerator, METH_O, nullptr},
    {"_get_device", _getDevice, METH_NOARGS, nullptr},
    {"_set_device", _setDevice, METH_O, nullptr},
    {"_exchangeDevice", _exchangeDevice, METH_O, nullptr},
    {"_get_device_count", _getDeviceCount, METH_NOARGS, nullptr},
    {nullptr, nullptr, 0, nullptr}};

Python side#

Wrap the C++ bindings with user-friendly Python functions:

def set_device(device) -> None:
    if device >= 0:
        torch_openreg._C._set_device(device)

Here’s the complete mapping from C++ to Python:

C++ binding function	C++ binding API (pybind11)	Python user API	Description
`_getDeviceCount`	`torch_openreg._C._get_device_count()`	`torch.openreg.device_count()`	Returns the total number of devices
`_getDevice`	`torch_openreg._C._get_device()`	`torch.openreg.current_device()`	Returns the current active device index
`_setDevice`	`torch_openreg._C._set_device(idx)`	`torch.openreg.set_device(idx)`	Sets the active device
`_exchangeDevice`	`torch_openreg._C._exchange_device(idx)`	N/A (internal use only)	Atomically swaps device and returns previous

Guard#

Device guards provide automatic device switching with exception safety. They’re similar to C++ lock guards—they switch devices on construction and restore on destruction.

Implement DeviceGuardImplInterface to integrate with PyTorch’s guard system:

  /**
   * Return the type of device managed by this guard implementation.
   */
  DeviceType type() const override {
    return static_type;
  }
  /**
   * Set the current device to device d, and return the previous Device.
   */
  // LITERALINCLUDE START: OPENREG GUARD DEVICE MANAGEMENT
  Device exchangeDevice(Device d) const override {
    TORCH_CHECK(d.is_privateuseone(), "Expected a PrivateUse1 device, but got ", d);

    auto old_device_index = ExchangeDevice(d.index());
    return Device(static_type, old_device_index);
  }
  // LITERALINCLUDE END: OPENREG GUARD DEVICE MANAGEMENT

  /**
   * Get the current device.
   */
  Device getDevice() const override {
    int device_index = current_device();
    return c10::Device(static_type, device_index);
  }

  /**
   * Get the device capability for a given device.
   * By default, OpenReg has 2 same devices with the same capability.
   */
  DeviceCapability getDeviceCapability(Device /*unused*/) const override {
    return DeviceCapability();
  }

  /**
   * Set the current device to c10::Device.
   */
  void setDevice(Device d) const override {
    TORCH_CHECK(d.is_privateuseone(), "Expected a PrivateUse1 device, but got ", d);

    set_device(d.index());
  }

  /**
   * Set the current device to device d, without checking for errors
   * (so, e.g., this can be called from a destructor).
   */
  void uncheckedSetDevice(Device d) const noexcept override {
    set_device(d.index());
  }

  /**
   * Get the number of devices.
   *
   * WARNING: This is REQUIRED to not raise an exception.
   * If there is some sort of problem, e.g., driver error,
   * you should report that there are zero available devices.
   */
  DeviceIndex deviceCount() const noexcept override {
    return device_count();
  }

  /**
   * Wait (by blocking the calling thread) until all the work has
   * completed running on the device.
   */
  void synchronizeDevice(const DeviceIndex device_index) const override {
    OPENREG_CHECK(orDeviceSynchronize());
  }

This makes the guard available in PyTorch for the PrivateUse1 device type; users can then use standard PyTorch device guards with the custom backend.