FunctionEvent#

class torch.autograd.profiler_util.FunctionEvent(id, name, thread, start_us, end_us, overload_name=None, fwd_thread=None, input_shapes=None, stack=None, scope=0, use_device=None, cpu_memory_usage=0, device_memory_usage=0, is_async=False, is_remote=False, sequence_nr=-1, node_id=-1, device_type=<DeviceType.CPU: 0>, device_index=0, device_resource_id=None, is_legacy=False, flops=None, trace_name=None, concrete_inputs=None, kwinputs=None, is_user_annotation=False, is_python_function=False, activity_type=None, metadata_json=None, flow_id=None, flow_type=None, flow_start=None, external_id=0, linked_correlation_id=0, extra_meta=None, structured_input_shapes=None, structured_input_strides=None, input_dtypes=None, python_id=-1, python_parent_id=-1, python_module_id=-1)[source]#

Profiling information about a single function.

FunctionEvent records the execution of a single operation during profiling. These events are obtained from the profiler/kineto and contain detailed timing and memory usage information.

Note

FunctionEvent objects are typically created by the profiler/kineto and should not be instantiated directly by users. Access them through the profiler’s output.

Variables:

id (int) – Unique identifier for this event.
node_id (int) – Node identifier for distributed profiling (-1 if not applicable).
name (str) – Name of the profiled function/operator.
overload_name (str) – Overload name for the operator (requires _ExperimentalConfig(capture_overload_names=True) set).
trace_name (str) – Same as name, just changes ProfilerStep* to ProfilerStep#
time_range (Interval) – Time interval containing start and end timestamps in microseconds.
thread (int) – Thread ID where the operation started.
fwd_thread (int) – Thread ID of the corresponding forward operation.
kernels (List[Kernel]) – List of device kernels launched by this operation.
count (int) – Number of times this event was called (usually 1).
cpu_children (List[FunctionEvent]) – Direct CPU child operations.
cpu_parent (FunctionEvent) – Direct CPU parent operation.
input_shapes (List[List[int]]) – Shapes of input tensors (requires record_shapes=True). For plain tensor inputs, each entry is a list of dimensions (e.g. [16, 16]). TensorList inputs are represented as an empty list []; use structured_input_shapes to get per-element shapes for TensorList inputs.
concrete_inputs (List[Any]) – Concrete input values (requires record_shapes=true).
kwinputs (Dict[str, Any]) – Keyword arguments (requires record_shapes=true).
stack (List[str]) – Python stack trace where the operation was called (requires with_stack=true).
scope (int) – at::RecordScope identifier (0=forward, 1=backward, etc.).
use_device (str) – Device type being profiled (“cuda”, “xpu”, etc.).
cpu_memory_usage (int) – CPU memory allocated in bytes.
device_memory_usage (int) – Device memory allocated in bytes.
is_async (bool) – Whether this is an asynchronous operation.
is_remote (bool) – Whether this operation occurred on a remote node.
sequence_nr (int) – Sequence number for autograd operations.
device_type (DeviceType) – Type of device (CPU, CUDA, XPU, PrivateUse1, etc.).
device_index (int) – Index of the device (e.g., GPU 0, 1, 2).
device_resource_id (int) – Resource ID on the device (ie. stream ID).
is_legacy (bool) – Whether this is from the legacy profiler.
flops (int) – Estimated floating point operations.
is_user_annotation (bool) – Whether this is a user-annotated region.
metadata_json (str) – Deprecated. Use event_metadata instead.
event_metadata (EventMetadata) – Additional metadata in structured format.
structured_input_shapes (List[List[int] | List[List[int]]]) – Like input_shapes but distinguishes TensorList inputs. Plain tensor inputs are List[int]; TensorList inputs are List[List[int]] containing one shape per tensor in the list. Matches the "Input Dims" field in the Chrome trace JSON.
structured_input_strides (List[List[int] | List[List[int]]]) – Strides of input tensors in the same format as structured_input_shapes (requires record_shapes=True).

Properties:: cpu_time_total (float): Total CPU time in microseconds. device_time_total (float): Total device (CUDA/XPU/etc) time in microseconds. self_cpu_time_total (float): CPU time excluding child operations. self_device_time_total (float): Device time excluding child operations. self_cpu_memory_usage (int): CPU memory usage excluding child operations. self_device_memory_usage (int): Device memory usage excluding child operations. cpu_time (float): Average CPU time per call. device_time (float): Average device time per call. key (str): Key used for grouping events (usually same as name).

FunctionEvent#

Docs

Tutorials

Resources