FunctionEvent#
- class torch.autograd.profiler_util.FunctionEvent(id, name, thread, start_us, end_us, overload_name=None, fwd_thread=None, input_shapes=None, stack=None, scope=0, use_device=None, cpu_memory_usage=0, device_memory_usage=0, is_async=False, is_remote=False, sequence_nr=-1, node_id=-1, device_type=<DeviceType.CPU: 0>, device_index=0, device_resource_id=None, is_legacy=False, flops=None, trace_name=None, concrete_inputs=None, kwinputs=None, is_user_annotation=False, is_python_function=False, activity_type=None, metadata_json=None, flow_id=None, flow_type=None, flow_start=None, external_id=0, linked_correlation_id=0, extra_meta=None, structured_input_shapes=None, structured_input_strides=None, input_dtypes=None, python_id=-1, python_parent_id=-1, python_module_id=-1)[source]#
Profiling information about a single function.
FunctionEvent records the execution of a single operation during profiling. These events are obtained from the profiler/kineto and contain detailed timing and memory usage information.
Note
FunctionEvent objects are typically created by the profiler/kineto and should not be instantiated directly by users. Access them through the profiler’s output.
- Variables:
id (int) – Unique identifier for this event.
node_id (int) – Node identifier for distributed profiling (-1 if not applicable).
name (str) – Name of the profiled function/operator.
overload_name (str) – Overload name for the operator (requires _ExperimentalConfig(capture_overload_names=True) set).
trace_name (str) – Same as name, just changes ProfilerStep* to ProfilerStep#
time_range (Interval) – Time interval containing start and end timestamps in microseconds.
thread (int) – Thread ID where the operation started.
fwd_thread (int) – Thread ID of the corresponding forward operation.
kernels (List[Kernel]) – List of device kernels launched by this operation.
count (int) – Number of times this event was called (usually 1).
cpu_children (List[FunctionEvent]) – Direct CPU child operations.
cpu_parent (FunctionEvent) – Direct CPU parent operation.
input_shapes (List[List[int]]) – Shapes of input tensors (requires record_shapes=True). For plain tensor inputs, each entry is a list of dimensions (e.g.
[16, 16]). TensorList inputs are represented as an empty list[]; usestructured_input_shapesto get per-element shapes for TensorList inputs.concrete_inputs (List[Any]) – Concrete input values (requires record_shapes=true).
kwinputs (Dict[str, Any]) – Keyword arguments (requires record_shapes=true).
stack (List[str]) – Python stack trace where the operation was called (requires with_stack=true).
scope (int) – at::RecordScope identifier (0=forward, 1=backward, etc.).
use_device (str) – Device type being profiled (“cuda”, “xpu”, etc.).
cpu_memory_usage (int) – CPU memory allocated in bytes.
device_memory_usage (int) – Device memory allocated in bytes.
is_async (bool) – Whether this is an asynchronous operation.
is_remote (bool) – Whether this operation occurred on a remote node.
sequence_nr (int) – Sequence number for autograd operations.
device_type (DeviceType) – Type of device (CPU, CUDA, XPU, PrivateUse1, etc.).
device_index (int) – Index of the device (e.g., GPU 0, 1, 2).
device_resource_id (int) – Resource ID on the device (ie. stream ID).
is_legacy (bool) – Whether this is from the legacy profiler.
flops (int) – Estimated floating point operations.
is_user_annotation (bool) – Whether this is a user-annotated region.
metadata_json (str) – Deprecated. Use event_metadata instead.
event_metadata (EventMetadata) – Additional metadata in structured format.
structured_input_shapes (List[List[int] | List[List[int]]]) – Like
input_shapesbut distinguishes TensorList inputs. Plain tensor inputs areList[int]; TensorList inputs areList[List[int]]containing one shape per tensor in the list. Matches the"Input Dims"field in the Chrome trace JSON.structured_input_strides (List[List[int] | List[List[int]]]) – Strides of input tensors in the same format as
structured_input_shapes(requires record_shapes=True).
- Properties:
cpu_time_total (float): Total CPU time in microseconds. device_time_total (float): Total device (CUDA/XPU/etc) time in microseconds. self_cpu_time_total (float): CPU time excluding child operations. self_device_time_total (float): Device time excluding child operations. self_cpu_memory_usage (int): CPU memory usage excluding child operations. self_device_memory_usage (int): Device memory usage excluding child operations. cpu_time (float): Average CPU time per call. device_time (float): Average device time per call. key (str): Key used for grouping events (usually same as name).
See also
torch.profiler.profile: Context manager for profilingEventList: List container for FunctionEvent objects with helper methodsFunctionEventAvg: Averaged statistics over multiple FunctionEvent objects