---
myst:
html_meta:
description: Tensor accessors in PyTorch C++ — efficient element-wise access to tensor data without overhead.
keywords: PyTorch, C++, tensor accessor, packed_accessor, data access
---
# Tensor Accessors
For element-wise operations in custom kernels, use *accessors* to avoid
dynamic dispatch overhead.
## CPU Accessors
```cpp
torch::Tensor foo = torch::rand({12, 12});
// Create accessor - validates type and dimensions once
auto foo_a = foo.accessor();
float trace = 0;
for (int i = 0; i < foo_a.size(0); i++) {
trace += foo_a[i][i];
}
```
## CUDA Packed Accessors
For CUDA kernels, use *packed accessors* which copy metadata instead of
pointing to it:
```cpp
__global__ void kernel(torch::PackedTensorAccessor64 foo, float* trace) {
int i = threadIdx.x;
gpuAtomicAdd(trace, foo[i][i]);
}
torch::Tensor foo = torch::rand({12, 12}).cuda();
auto foo_a = foo.packed_accessor64();
float trace = 0;
kernel<<<1, 12>>>(foo_a, &trace);
```
```{tip}
Use `PackedTensorAccessor32` and `packed_accessor32` for 32-bit indexing,
which is faster on CUDA but may overflow for large tensors.
```