---
myst:
html_meta:
description: Stable ABI utilities in PyTorch C++ — version checking and compatibility helpers.
keywords: PyTorch, C++, stable ABI, utilities, version, compatibility
---
# Utilities
The stable API provides various utility functions and types for working with
tensors and CUDA operations.
## DeviceGuard Class
```{doxygenclass} torch::stable::accelerator::DeviceGuard
:members:
:undoc-members:
```
```{doxygenfunction} torch::stable::accelerator::getCurrentDeviceIndex
```
**Example:**
```cpp
{
torch::stable::accelerator::DeviceGuard guard(1);
// Operations here run on device 1
}
// Previous device is restored
```
## Stream
```{doxygenclass} torch::stable::accelerator::Stream
:members:
:undoc-members:
```
## Stream Utilities
For CUDA stream access, we currently recommend the ABI stable C shim API. This
will be improved in a future release with a more ergonomic wrapper.
### Getting the Current CUDA Stream
To obtain the current `cudaStream_t` for use in CUDA kernels:
```cpp
#include
#include
// For now, we rely on the ABI stable C shim API to get the current CUDA stream.
void* stream_ptr = nullptr;
TORCH_ERROR_CODE_CHECK(
aoti_torch_get_current_cuda_stream(tensor.get_device_index(), &stream_ptr));
cudaStream_t stream = static_cast(stream_ptr);
// Now you can use 'stream' in your CUDA kernel launches
my_kernel<<>>(args...);
```
```{note}
The `TORCH_ERROR_CODE_CHECK` macro is required when using C shim APIs
to properly check error codes and throw appropriate exceptions.
```
## CUDA Error Checking Macros
These macros provide stable ABI equivalents for CUDA error checking.
They wrap CUDA API calls and kernel launches, providing detailed error
messages using PyTorch's error formatting.
### STD_CUDA_CHECK
```{c:macro} STD_CUDA_CHECK(EXPR)
Checks the result of a CUDA API call and throws an exception on error.
Users of this macro are expected to include `cuda_runtime.h`.
**Example:**
```cpp
STD_CUDA_CHECK(cudaMalloc(&ptr, size));
STD_CUDA_CHECK(cudaMemcpy(dst, src, size, cudaMemcpyDeviceToHost));
```
Minimum compatible version: PyTorch 2.10.
```
### STD_CUDA_KERNEL_LAUNCH_CHECK
```{c:macro} STD_CUDA_KERNEL_LAUNCH_CHECK()
Checks for errors from the most recent CUDA kernel launch. Equivalent to
`STD_CUDA_CHECK(cudaGetLastError())`.
**Example:**
```cpp
my_kernel<<>>(args...);
STD_CUDA_KERNEL_LAUNCH_CHECK();
```
Minimum compatible version: PyTorch 2.10.
```
## Header-Only Utilities
The `torch::headeronly` namespace provides header-only versions of common
PyTorch types and utilities. These can be used without linking against libtorch
at all! This portability makes them ideal for maintaining binary compatibility
across PyTorch versions.
### Error Checking
`STD_TORCH_CHECK` is a header-only macro for runtime assertions:
```cpp
#include
STD_TORCH_CHECK(condition, "Error message with ", variable, " interpolation");
```
Wherever you used `TORCH_CHECK` before, you can replace usage with `STD_TORCH_CHECK`
to remove the need to link against libtorch. The only difference is that when the
condition check fails, `TORCH_CHECK` throws a fancier `c10::Error` while
`STD_TORCH_CHECK` throws a `std::runtime_error`.
### Core Types
The following `c10::` types are available as header-only versions under
`torch::headeronly::`:
- `torch::headeronly::ScalarType` - Tensor data types (Float, Double, Int, etc.)
- `torch::headeronly::DeviceType` - Device types (CPU, CUDA, etc.)
- `torch::headeronly::MemoryFormat` - Memory layout formats (Contiguous, ChannelsLast, etc.)
- `torch::headeronly::Layout` - Tensor layouts (Strided, Sparse, etc.)
```cpp
#include
#include
#include
#include
auto dtype = torch::headeronly::ScalarType::Float;
auto device_type = torch::headeronly::DeviceType::CUDA;
auto memory_format = torch::headeronly::MemoryFormat::Contiguous;
auto layout = torch::headeronly::Layout::Strided;
```
### TensorAccessor
`TensorAccessor` provides efficient, bounds-checked access to tensor data.
You can construct one from a stable tensor's data pointer, sizes, and strides:
```cpp
#include
// Create a TensorAccessor for a 2D float tensor
auto sizes = tensor.sizes();
auto strides = tensor.strides();
torch::headeronly::TensorAccessor accessor(
static_cast(tensor.mutable_data_ptr()),
sizes.data(),
strides.data());
// Access elements
float value = accessor[i][j];
```
### Dispatch Macros
Header-only dispatch macros (THO = Torch Header Only) are available for
dtype dispatching:
```cpp
#include
THO_DISPATCH_V2(
tensor.scalar_type(), // will be resolved as scalar_t
"my_kernel",
AT_WRAP(([&]() {
// code to specialize with scalar_t
// scalar_t is the resolved C++ type (e.g. float, double)
auto* data = static_cast(tensor.mutable_data_ptr());
Scalar s(*data);
})),
AT_EXPAND(AT_ALL_TYPES),
AT_EXPAND(AT_COMPLEX_TYPES),
torch::headeronly::ScalarType::Half,
// as many type arguments as needed
);
```
`THO_DISPATCH_V2` works the same way as `AT_DISPATCH_V2` (see
`ATen/Dispatch_v2.h`) but does not require linking against libtorch.
As a result, whereas `AT_DISPATCH_V2` would have thrown `c10::NotImplementedError`
for unimplemented paths, `THO_DISPATCH_V2` will throw `std::runtime_error`.
For ease of use, we've also migrated the below AT_* macros representing
collections of types to be header-only and thus have no dependency on libtorch:
- `AT_FLOATING_TYPES`
- `AT_INTEGRAL_TYPES`
- `AT_INTEGRAL_TYPES_V2`
- `AT_ALL_TYPES`
- `AT_COMPLEX_TYPES`
- `AT_ALL_TYPES_AND_COMPLEX`
- `AT_FLOAT8_TYPES`
- `AT_BAREBONES_UNSIGNED_TYPES`
- `AT_QINT_TYPES`
If your extension uses our older AT_DISPATCH version 1 infrastructure,
you can also migrate to a header-only libtorch-free world without upgrading
everything to version 2.
`THO_DISPATCH_SWITCH` and `THO_DISPATCH_CASE` are the header-only
equivalents of `AT_DISPATCH_SWITCH` and `AT_DISPATCH_CASE`. Similarly,
the only user-visible difference is the exception type on an unhandled dtype,
where the `AT_` version throws a `c10::NotImplementedError` and the `THO_`
version throws a `std::runtime_error`.
The migration is pretty mechanical:
- `AT_DISPATCH_SWITCH` → `THO_DISPATCH_SWITCH`
- `AT_DISPATCH_CASE` → `THO_DISPATCH_CASE`
- `AT_PRIVATE_CASE_TYPE_USING_HINT` → `THO_PRIVATE_CASE_TYPE_USING_HINT`
- `at::ScalarType::X` → `torch::headeronly::ScalarType::X`
```cpp
// ---- Before (requires linking against libtorch) ----
#include
#define MY_DISPATCH_CASE_FLOATING_TYPES(...) \
AT_DISPATCH_CASE(at::ScalarType::Float, __VA_ARGS__) \
AT_DISPATCH_CASE(at::ScalarType::Half, __VA_ARGS__) \
AT_DISPATCH_CASE(at::ScalarType::BFloat16, __VA_ARGS__)
#define MY_DISPATCH_FLOATING_TYPES(TYPE, NAME, ...) \
AT_DISPATCH_SWITCH(TYPE, NAME, \
MY_DISPATCH_CASE_FLOATING_TYPES(__VA_ARGS__))
```
```cpp
// ---- After (header-only, no libtorch dependency) ----
#include
#define MY_DISPATCH_CASE_FLOATING_TYPES(...) \
THO_DISPATCH_CASE(torch::headeronly::ScalarType::Float, __VA_ARGS__) \
THO_DISPATCH_CASE(torch::headeronly::ScalarType::Half, __VA_ARGS__) \
THO_DISPATCH_CASE(torch::headeronly::ScalarType::BFloat16, __VA_ARGS__)
#define MY_DISPATCH_FLOATING_TYPES(TYPE, NAME, ...) \
THO_DISPATCH_SWITCH(TYPE, NAME, \
MY_DISPATCH_CASE_FLOATING_TYPES(__VA_ARGS__))
```
For the complete list of header-only APIs, see `torch/header_only_apis.txt`
in the PyTorch source tree.
## Parallelization Utilities
```{doxygenfunction} torch::stable::parallel_for
```
```{doxygenfunction} torch::stable::get_num_threads
```