graph#
- class torch.cuda.graph(cuda_graph, pool=None, stream=None, capture_error_mode='global', enable_annotations=False)[source]#
Context-manager that captures CUDA work into a
torch.cuda.CUDAGraphobject for later replay.See CUDA Graphs for a general introduction, detailed use, and constraints.
- Parameters:
cuda_graph (torch.cuda.CUDAGraph) – Graph object used for capture.
pool (optional) – Opaque token (returned by a call to
graph_pool_handle()orother_Graph_instance.pool()) hinting this graph’s capture may share memory from the specified pool. See Graph memory management.stream (torch.cuda.Stream, optional) – If supplied, will be set as the current stream in the context. If not supplied,
graphsets its own internal side stream as the current stream in the context.capture_error_mode (str, optional) – specifies the cudaStreamCaptureMode for the graph capture stream. Can be “global”, “thread_local” or “relaxed”. During cuda graph capture, some actions, such as cudaMalloc, may be unsafe. “global” will error on actions in other threads, “thread_local” will only error for actions in the current thread, and “relaxed” will not error on actions. Do NOT change this setting unless you’re familiar with cudaStreamCaptureMode
enable_annotations (bool, optional) – If
True, enables kernel annotation recording on entry and automatically callsresolve_pending_annotations()before the capture ends. Annotations are not cleared on exit so that multiple graphs in the same workload can accumulate annotations. Requirescuda.bindingspackage and cuda-compat >= 13.1 or CUDA driver >= 13.1.
Note
For effective memory sharing, if you pass a
poolused by a previous capture and the previous capture used an explicitstreamargument, you should pass the samestreamargument to this capture.Warning
This API is in beta and may change in future releases.