Skip to main content
Ctrl+K
PyTorch - Home
PyTorch - Home
  • Install PyTorch 
  • User Guide
    • PyTorch Main Components
    • torch.compiler
    • torch.export
    • Developer Notes
    • Accelerator Integration
  • Reference API
    • torch
    • torch.nn
    • torch.nn.functional
    • torch.Tensor
    • Tensor Attributes
    • Tensor Views
    • Automatic Mixed Precision package - torch.amp
    • Automatic differentiation package - torch.autograd
    • torch.library
    • torch.accelerator
    • torch.cpu
    • torch.cuda
    • Understanding CUDA Memory Usage
    • torch.mps
    • torch.xpu
    • torch.mtia
    • torch.mtia.memory
    • torch.mtia.mtia_graph
    • Meta device
    • torch.backends
    • torch.export
    • Distributed communication package - torch.distributed
    • torch.distributed.tensor
    • Generic Join Context Manager
    • Torch Distributed Elastic
    • FullyShardedDataParallel
    • torch.distributed.fsdp.fully_shard
    • Tensor Parallelism - torch.distributed.tensor.parallel
    • Distributed Optimizers
    • Pipeline Parallelism
    • PyTorch Symmetric Memory
    • Distributed Checkpoint - torch.distributed.checkpoint
    • Probability distributions - torch.distributions
    • torch.compiler API reference
    • torch.fft
    • torch.func
    • torch.futures
    • torch.fx
    • torch.fx.experimental
    • torch.hub
    • torch.linalg
    • torch.monitor
    • torch.signal
    • torch.special
    • torch.overrides
    • torch.nativert
    • torch.package
    • torch.profiler
    • torch.nn.init
    • torch.nn.attention
    • torch.onnx
    • torch.optim
    • Complex Numbers
    • DDP Communication Hooks
    • Quantization
    • Distributed RPC Framework
    • torch.random
    • torch.masked
    • torch.nested
    • torch.Size
    • torch.sparse
    • torch.Storage
    • torch.testing
    • torch.utils
    • Benchmark Utils - torch.utils.benchmark
    • torch.utils.checkpoint
    • torch.utils.cpp_extension
    • torch.utils.data
    • torch.utils.deterministic
    • JIT Utils - torch.utils.jit
    • torch.utils.dlpack
    • torch.utils.mobile_optimizer
    • torch.utils.model_zoo
    • torch.utils.tensorboard
    • torch.utils.module_tracker
    • Type Info
    • Named Tensors
    • Named Tensors operator coverage
    • torch.config
    • torch.__future__
    • torch._logging
    • Torch Environment Variables
  • Developer Notes
    • Automatic Mixed Precision examples
    • Autograd mechanics
    • Broadcasting semantics
    • CPU threading and TorchScript inference
    • CUDA semantics
    • PyTorch Custom Operators Landing Page
    • Distributed Data Parallel
    • Extending PyTorch
    • Extending torch.func with autograd.Function
    • Frequently Asked Questions
    • Getting Started on Intel GPU
    • Gradcheck mechanics
    • HIP (ROCm) semantics
    • Features for large-scale deployments
    • LibTorch Stable ABI
    • LocalTensor Tutorial: Single-Process SPMD Debugging
    • MKLDNN backend
    • Modules
    • MPS backend
    • Multiprocessing best practices
    • Numerical accuracy
    • Out Notes
    • Reproducibility
    • Serialization semantics
    • Windows FAQ
  • Community
    • PyTorch Governance | Build + CI
    • PyTorch Contribution Guide
    • PyTorch Design Philosophy
    • PyTorch Governance | Mechanics
    • PyTorch Governance | Maintainers
  • Tutorials 
Go to pytorch.org
Ctrl+K
  • X
  • GitHub
  • PyTorch Forum
  • PyPi
  • Install PyTorch 
  • User Guide
    • PyTorch Main Components
    • torch.compiler
    • torch.export
    • Developer Notes
    • Accelerator Integration
  • Reference API
    • torch
    • torch.nn
    • torch.nn.functional
    • torch.Tensor
    • Tensor Attributes
    • Tensor Views
    • Automatic Mixed Precision package - torch.amp
    • Automatic differentiation package - torch.autograd
    • torch.library
    • torch.accelerator
    • torch.cpu
    • torch.cuda
    • Understanding CUDA Memory Usage
    • torch.mps
    • torch.xpu
    • torch.mtia
    • torch.mtia.memory
    • torch.mtia.mtia_graph
    • Meta device
    • torch.backends
    • torch.export
    • Distributed communication package - torch.distributed
    • torch.distributed.tensor
    • Generic Join Context Manager
    • Torch Distributed Elastic
    • FullyShardedDataParallel
    • torch.distributed.fsdp.fully_shard
    • Tensor Parallelism - torch.distributed.tensor.parallel
    • Distributed Optimizers
    • Pipeline Parallelism
    • PyTorch Symmetric Memory
    • Distributed Checkpoint - torch.distributed.checkpoint
    • Probability distributions - torch.distributions
    • torch.compiler API reference
    • torch.fft
    • torch.func
    • torch.futures
    • torch.fx
    • torch.fx.experimental
    • torch.hub
    • torch.linalg
    • torch.monitor
    • torch.signal
    • torch.special
    • torch.overrides
    • torch.nativert
    • torch.package
    • torch.profiler
    • torch.nn.init
    • torch.nn.attention
    • torch.onnx
    • torch.optim
    • Complex Numbers
    • DDP Communication Hooks
    • Quantization
    • Distributed RPC Framework
    • torch.random
    • torch.masked
    • torch.nested
    • torch.Size
    • torch.sparse
    • torch.Storage
    • torch.testing
    • torch.utils
    • Benchmark Utils - torch.utils.benchmark
    • torch.utils.checkpoint
    • torch.utils.cpp_extension
    • torch.utils.data
    • torch.utils.deterministic
    • JIT Utils - torch.utils.jit
    • torch.utils.dlpack
    • torch.utils.mobile_optimizer
    • torch.utils.model_zoo
    • torch.utils.tensorboard
    • torch.utils.module_tracker
    • Type Info
    • Named Tensors
    • Named Tensors operator coverage
    • torch.config
    • torch.__future__
    • torch._logging
    • Torch Environment Variables
  • Developer Notes
    • Automatic Mixed Precision examples
    • Autograd mechanics
    • Broadcasting semantics
    • CPU threading and TorchScript inference
    • CUDA semantics
    • PyTorch Custom Operators Landing Page
    • Distributed Data Parallel
    • Extending PyTorch
    • Extending torch.func with autograd.Function
    • Frequently Asked Questions
    • Getting Started on Intel GPU
    • Gradcheck mechanics
    • HIP (ROCm) semantics
    • Features for large-scale deployments
    • LibTorch Stable ABI
    • LocalTensor Tutorial: Single-Process SPMD Debugging
    • MKLDNN backend
    • Modules
    • MPS backend
    • Multiprocessing best practices
    • Numerical accuracy
    • Out Notes
    • Reproducibility
    • Serialization semantics
    • Windows FAQ
  • Community
    • PyTorch Governance | Build + CI
    • PyTorch Contribution Guide
    • PyTorch Design Philosophy
    • PyTorch Governance | Mechanics
    • PyTorch Governance | Maintainers
  • Tutorials 
Go to pytorch.org
Ctrl+K
  • X
  • GitHub
  • PyTorch Forum
  • PyPi

Section Navigation

Introduction

  • Pytorch Overview
  • Get Started
  • Learn the Basics

Core Concepts

  • PyTorch Main Components

Torch Compile

  • Torch.compile
    • Getting Started
    • Core Concepts
      • torch.compile Programming Model
        • Dynamo Core Concepts
        • Working with Graph Breaks
        • Non-strict Tracing Programming Model
        • Dealing with Recompilations
        • tlparse / TORCH_TRACE
        • Reporting Issues
      • Dynamo Overview
      • PyTorch 2.0 NNModule Support
      • torch.compile has different autograd semantics
    • Performance
      • PyTorch 2.0 Performance Dashboard
      • TorchInductor GPU Profiling
      • Profiling to understand torch.compile performance
      • CUDAGraph Trees
    • Advanced
      • Dynamo Deep-Dive
      • Writing Graph Transformations on ATen IR
      • Fake tensor
      • Custom Backends
      • Dynamic Shapes
        • Dynamic Shapes Core Concepts
        • Troubleshooting Dynamic Shapes
        • Advanced Options to Control Dynamic Behavior
        • Beyond the Basics
    • Troubleshooting FAQs
      • tlparse / TORCH_TRACE
      • Reporting Issues
      • torch.compile Troubleshooting
      • Frequently Asked Questions
    • Reference/API
      • torch.compiler API reference
        • torch.compiler.compile
        • torch.compiler.reset
        • torch.compiler.allow_in_graph
        • torch.compiler.substitute_in_graph
        • torch.compiler.assume_constant_result
        • torch.compiler.list_backends
        • torch.compiler.disable
        • torch.compiler.set_stance
        • torch.compiler.set_enable_guard_collectives
        • torch.compiler.cudagraph_mark_step_begin
        • torch.compiler.is_compiling
        • torch.compiler.is_dynamo_compiling
        • torch.compiler.is_exporting
        • torch.compiler.keep_portable_guards_unsafe
        • torch.compiler.skip_guard_on_inbuilt_nn_modules_unsafe
        • torch.compiler.skip_guard_on_all_nn_modules_unsafe
        • torch.compiler.keep_tensor_guards_unsafe
        • torch.compiler.skip_guard_on_globals_unsafe
        • torch.compiler.skip_all_guards_unsafe
        • torch.compiler.nested_compile_region
      • torch.compiler.config
      • TorchDynamo APIs for fine-grained tracing
      • TorchInductor and AOTInductor Provenance Tracking
  • Torch.export
    • torch.export API Reference
    • torch.export Programming Model
    • torch.export IR Specification
    • PT2 Archive Spec
    • Draft Export
    • Joint with descriptors
    • Control Flow Operators
      • Control Flow - Cond
      • Control Flow - While Loop
      • Control Flow - Scan
      • Control Flow - Associative Scan
      • Control Flow - Map
    • ExportDB
      • torch.escape-hatch
      • torch.dynamic-shape
      • torch.cond
      • python.closure
      • torch.dynamic-value
      • python.data-structure
      • python.assert
      • python.control-flow
      • torch.map
      • python.builtin
      • python.object-model
      • python.context-manager
      • torch.operator
      • torch.mutation
    • AOTInductor: Ahead-Of-Time Compilation for Torch.Export-ed Models
      • torch._logging
        • torch._logging.set_logs
      • AOTInductor Minifier
      • AOTInductor Debugging Guide
    • IRs
    • Dynamic Shapes
      • Dynamic Shapes Core Concepts
      • Troubleshooting Dynamic Shapes
        • Debugging with tlparse and TORCH_LOGS=dynamic
        • Troubleshooting GuardOnDataDependentSymNode Errors
      • Advanced Options to Control Dynamic Behavior
      • Beyond the Basics
        • The Zero-One Specialization Problem
        • Backed vs Unbacked Symints
    • Fake tensor
    • Writing Graph Transformations on ATen IR

Developer Notes

  • Developer Notes
    • Automatic Mixed Precision examples
    • Autograd mechanics
    • Broadcasting semantics
    • CPU threading and TorchScript inference
    • CUDA semantics
    • PyTorch Custom Operators Landing Page
    • Distributed Data Parallel
    • Extending PyTorch
    • Extending torch.func with autograd.Function
    • Frequently Asked Questions
    • Getting Started on Intel GPU
    • Gradcheck mechanics
    • HIP (ROCm) semantics
    • Features for large-scale deployments
    • LibTorch Stable ABI
    • LocalTensor Tutorial: Single-Process SPMD Debugging
    • MKLDNN backend
    • Modules
    • MPS backend
    • Multiprocessing best practices
    • Numerical accuracy
    • Out Notes
    • Reproducibility
    • Serialization semantics
    • Windows FAQ

Accelerator Integration

  • Accelerator Integration
    • Device Management
    • Accelerator Hooks
    • Guard
    • Autoload Mechanism
    • Operator Registration
    • Automatic Mixed Precision
    • Profiler Integration
  • User Guide
  • Developer Notes
Rate this Page
★ ★ ★ ★ ★

Developer Notes#

Created On: Apr 16, 2025 | Last Updated On: Apr 16, 2025

  • Automatic Mixed Precision examples
  • Autograd mechanics
  • Broadcasting semantics
  • CPU threading and TorchScript inference
  • CUDA semantics
  • PyTorch Custom Operators Landing Page
  • Distributed Data Parallel
  • Extending PyTorch
  • Extending torch.func with autograd.Function
  • Frequently Asked Questions
  • Getting Started on Intel GPU
  • Gradcheck mechanics
  • HIP (ROCm) semantics
  • Features for large-scale deployments
  • LibTorch Stable ABI
  • LocalTensor Tutorial: Single-Process SPMD Debugging
  • MKLDNN backend
  • Bfloat16 (BF16) on MKLDNN backend
  • Modules
  • MPS backend
  • Multiprocessing best practices
  • Numerical accuracy
  • Out Notes
  • Reproducibility
  • Serialization semantics
  • Windows FAQ
Rate this Page
★ ★ ★ ★ ★

previous

IRs

next

Automatic Mixed Precision examples

Built with the PyData Sphinx Theme 0.15.4.

previous

IRs

next

Automatic Mixed Precision examples

Edit on GitHub
Show Source
PyTorch Libraries
  • ExecuTorch
  • Helion
  • torchao
  • kineto
  • torchtitan
  • TorchRL
  • torchvision
  • torchaudio
  • tensordict
  • PyTorch on XLA Devices

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources

Stay in touch for updates, event info, and the latest news

By submitting this form, I consent to receive marketing emails from the LF and its projects regarding their events, training, research, developments, and related announcements. I understand that I can unsubscribe at any time using the links in the footers of the emails I receive. Privacy Policy.

© PyTorch. Copyright © The Linux Foundation®. All rights reserved. The Linux Foundation has registered trademarks and uses trademarks. For more information, including terms of use, privacy policy, and trademark usage, please see our Policies page. Trademark Usage. Privacy Policy.

To analyze traffic and optimize your experience, we serve cookies on this site. By clicking or navigating, you agree to allow our usage of cookies. As the current maintainers of this site, Facebook’s Cookies Policy applies. Learn more, including about available controls: Cookies Policy.

© Copyright PyTorch Contributors.

Created using Sphinx 7.2.6.

Built with the PyData Sphinx Theme 0.15.4.