.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_torch_compile_stable_diffusion_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_torch_compile_stable_diffusion.py`
.. raw:: html
Compiling Stable Diffusion model using the torch.compile backend
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_debugger_example_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_debugger_example.py`
.. raw:: html
Debugging Torch-TensorRT Compilation
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_cross_runtime_compilation_for_windows_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_cross_runtime_compilation_for_windows.py`
.. raw:: html
cross runtime compilation limitations:
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_refit_engine_example_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_refit_engine_example.py`
.. raw:: html
Refitting Torch-TensorRT Programs with New Weights
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_torch_compile_transformers_example_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_torch_compile_transformers_example.py`
.. raw:: html
Compiling BERT using the torch.compile backend
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_torch_compile_gpt2_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_torch_compile_gpt2.py`
.. raw:: html
Compiling GPT2 using the Torch-TensorRT torch.compile frontend
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_torch_compile_advanced_usage_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_torch_compile_advanced_usage.py`
.. raw:: html
Torch Compile Advanced Usage
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_engine_caching_bert_example_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_engine_caching_bert_example.py`
.. raw:: html
Engine Caching (BERT)
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_torch_export_cudagraphs_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_torch_export_cudagraphs.py`
.. raw:: html
Torch Export with Cudagraphs
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_pre_allocated_output_example_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_pre_allocated_output_example.py`
.. raw:: html
Pre-allocated output buffer
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_torch_compile_resnet_example_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_torch_compile_resnet_example.py`
.. raw:: html
Compiling ResNet with dynamic shapes using the torch.compile backend
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_autocast_example_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_autocast_example.py`
.. raw:: html
An example of using Torch-TensorRT Autocast
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_torch_export_flux_dev_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_torch_export_flux_dev.py`
.. raw:: html
Compiling FLUX.1-dev model using the Torch-TensorRT dynamo backend
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_auto_generate_converters_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_auto_generate_converters.py`
.. raw:: html
Automatically Generate a Converter for a Custom Kernel
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_auto_generate_plugins_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_auto_generate_plugins.py`
.. raw:: html
Automatically Generate a Plugin for a Custom Kernel
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_low_cpu_memory_compilation_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_low_cpu_memory_compilation.py`
.. raw:: html
Low CPU Memory Compilation Example
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_aot_plugin_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_aot_plugin.py`
.. raw:: html
Torch-TensorRT supports falling back to PyTorch implementations of operations in the case that Torch-TensorRT
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_converter_overloading_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_converter_overloading.py`
.. raw:: html
Overloading Torch-TensorRT Converters with Custom Converters
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_hierarchical_partitioner_example_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_hierarchical_partitioner_example.py`
.. raw:: html
Hierarchical Partitioner Example
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_mutable_torchtrt_module_example_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_mutable_torchtrt_module_example.py`
.. raw:: html
Mutable Torch TensorRT Module
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_weight_streaming_example_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_weight_streaming_example.py`
.. raw:: html
Weight Streaming
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_torch_export_sam2_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_torch_export_sam2.py`
.. raw:: html
Compiling SAM2 using the dynamo backend
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_vgg16_ptq_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_vgg16_ptq.py`
.. raw:: html
Deploy Quantized Models using Torch-TensorRT
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_engine_caching_example_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_engine_caching_example.py`
.. raw:: html
Engine Caching
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_nvrtc_aot_plugin_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_nvrtc_aot_plugin.py`
.. raw:: html
Using Custom Kernels with NVRTC in TensorRT AOT Plugins
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_llama2_flashinfer_rmsnorm_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_llama2_flashinfer_rmsnorm.py`
.. raw:: html
.._llama2_flashinfer_rmsnorm:
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_custom_kernel_plugins_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_custom_kernel_plugins.py`
.. raw:: html
Using Custom Kernels within TensorRT Engines with Torch-TensorRT
.. raw:: html
Serving a Torch-TensorRT model with Triton
==========================================
Optimization and deployment go hand in hand in a discussion about Machine
Learning infrastructure. Once network level optimization are done
to get the maximum performance, the next step would be to deploy it.
However, serving this optimized model comes with its own set of considerations
and challenges like: building an infrastructure to support concurrent model
executions, supporting clients over HTTP or gRPC and more.
The `Triton Inference Server