Eager

When does fragmentation occur in the CUDA caching allocator?

Edward Yang (@ezyang) · June 1, 2026
eagercudamemory

Disclosure. This post was drafted by Claude (Anthropic’s coding assistant) with editing from ezyang. In an ideal world, users of CUDA memory in PyTorch programs should be able to abstract the allocator behavior as: there is a fixed amount of GPU memory, whenever you allocate this available memory goes down, and when you free the available memory goes back up. Unfortunately, the internal implementation of the CUDA caching allocator means that certain allocation patterns can give rise to fragmentation, where even though there is “technically” enough free space to store a requested allocation, the CUDA caching allocator is unable to actually serve the request. There are many modern use cases …

Continue reading →

Recent

When does fragmentation occur in the CUDA caching allocator?

All Eager Logs