graph#
- class torch.cuda.graph(cuda_graph, pool=None, stream=None, capture_error_mode='global', enable_annotations=False, check_input_liveness=False)[source]#
Context-manager that captures CUDA work into a
torch.cuda.CUDAGraphobject for later replay.See CUDA Graphs for a general introduction, detailed use, and constraints.
- Parameters:
cuda_graph (torch.cuda.CUDAGraph) – Graph object used for capture.
pool (optional) – Opaque token (returned by a call to
graph_pool_handle()orother_Graph_instance.pool()) hinting this graph’s capture may share memory from the specified pool. See Graph memory management.stream (torch.cuda.Stream, optional) – If supplied, will be set as the current stream in the context. If not supplied,
graphsets its own internal side stream as the current stream in the context.capture_error_mode (str, optional) – specifies the cudaStreamCaptureMode for the graph capture stream. Can be “global”, “thread_local” or “relaxed”. During cuda graph capture, some actions, such as cudaMalloc, may be unsafe. “global” will error on actions in other threads, “thread_local” will only error for actions in the current thread, and “relaxed” will not error on actions. Do NOT change this setting unless you’re familiar with cudaStreamCaptureMode
enable_annotations (bool, optional) – If
True, enables kernel annotation recording on entry and automatically callsresolve_pending_annotations()before the capture ends. Annotations are not cleared on exit so that multiple graphs in the same workload can accumulate annotations. Requirescuda.bindingspackage and cuda-compat >= 13.1 or CUDA driver >= 13.1.check_input_liveness (bool, optional) –
If
True, tracks external tensor inputs during graph capture and raises an error if any are deallocated before replay. This helps debug “use after free” errors where input tensors are garbage collected between capture and replay. Default:False.Note
Custom CUDA kernels added outside PyTorch (e.g., via cuLaunchKernel or DLPack) are not tracked by this mechanism.
Note
For effective memory sharing, if you pass a
poolused by a previous capture and the previous capture used an explicitstreamargument, you should pass the samestreamargument to this capture.Warning
This API is in beta and may change in future releases.