.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_torch_compile_stable_diffusion_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_torch_compile_stable_diffusion.py`
.. raw:: html
Compiling Stable Diffusion model using the torch.compile backend
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_cross_runtime_compilation_for_windows_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_cross_runtime_compilation_for_windows.py`
.. raw:: html
cross runtime compilation limitations:
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_refit_engine_example_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_refit_engine_example.py`
.. raw:: html
Refitting Torch-TensorRT Programs with New Weights
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_torch_compile_transformers_example_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_torch_compile_transformers_example.py`
.. raw:: html
Compiling BERT using the torch.compile backend
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_torch_compile_gpt2_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_torch_compile_gpt2.py`
.. raw:: html
Compiling GPT2 using the Torch-TensorRT torch.compile frontend
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_torch_compile_advanced_usage_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_torch_compile_advanced_usage.py`
.. raw:: html
Torch Compile Advanced Usage
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_torch_export_cudagraphs_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_torch_export_cudagraphs.py`
.. raw:: html
Torch Export with Cudagraphs
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_engine_caching_bert_example_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_engine_caching_bert_example.py`
.. raw:: html
Engine Caching (BERT)
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_pre_allocated_output_example_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_pre_allocated_output_example.py`
.. raw:: html
Pre-allocated output buffer
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_torch_compile_resnet_example_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_torch_compile_resnet_example.py`
.. raw:: html
Compiling ResNet with dynamic shapes using the torch.compile backend
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_torch_export_flux_dev_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_torch_export_flux_dev.py`
.. raw:: html
Compiling FLUX.1-dev model using the Torch-TensorRT dynamo backend
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_torch_export_gpt2_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_torch_export_gpt2.py`
.. raw:: html
Compiling GPT2 using the dynamo backend
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_torch_export_llama2_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_torch_export_llama2.py`
.. raw:: html
Compiling Llama2 using the dynamo backend
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_auto_generate_converters_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_auto_generate_converters.py`
.. raw:: html
Automatically Generate a Converter for a Custom Kernel
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_auto_generate_plugins_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_auto_generate_plugins.py`
.. raw:: html
Automatically Generate a Plugin for a Custom Kernel
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_converter_overloading_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_converter_overloading.py`
.. raw:: html
Overloading Torch-TensorRT Converters with Custom Converters
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_weight_streaming_example_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_weight_streaming_example.py`
.. raw:: html
Weight Streaming
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_mutable_torchtrt_module_example_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_mutable_torchtrt_module_example.py`
.. raw:: html
Mutable Torch TensorRT Module
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_torch_export_sam2_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_torch_export_sam2.py`
.. raw:: html
Compiling SAM2 using the dynamo backend
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_vgg16_ptq_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_vgg16_ptq.py`
.. raw:: html
Deploy Quantized Models using Torch-TensorRT
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_engine_caching_example_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_engine_caching_example.py`
.. raw:: html
Engine Caching
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_llama2_flashinfer_rmsnorm_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_llama2_flashinfer_rmsnorm.py`
.. raw:: html
.._llama2_flashinfer_rmsnorm:
.. raw:: html
.. only:: html
.. image:: /tutorials/_rendered_examples/dynamo/images/thumb/sphx_glr_custom_kernel_plugins_thumb.png
:alt:
:ref:`sphx_glr_tutorials__rendered_examples_dynamo_custom_kernel_plugins.py`
.. raw:: html
Using Custom Kernels within TensorRT Engines with Torch-TensorRT
.. raw:: html
Serving a Torch-TensorRT model with Triton
==========================================
Optimization and deployment go hand in hand in a discussion about Machine
Learning infrastructure. Once network level optimization are done
to get the maximum performance, the next step would be to deploy it.
However, serving this optimized model comes with its own set of considerations
and challenges like: building an infrastructure to support concurrent model
executions, supporting clients over HTTP or gRPC and more.
The `Triton Inference Server