Distributed Inference#

Examples of multi-GPU distributed inference with Torch-TensorRT, covering data parallelism (running copies of the same model on multiple GPUs) and tensor parallelism (splitting a single large model across multiple GPUs).

sphx_glr_tutorials__rendered_examples_distributed_inference_data_parallel_stable_diffusion.py

Torch-TensorRT Distributed Inference

sphx_glr_tutorials__rendered_examples_distributed_inference_data_parallel_gpt2.py

Torch-TensorRT Distributed Inference

sphx_glr_tutorials__rendered_examples_distributed_inference_test_multinode_nccl.py

Two-node native TensorRT NCCL test.

sphx_glr_tutorials__rendered_examples_distributed_inference_tensor_parallel_simple_example.py

Tensor Parallel Distributed Inference with Torch-TensorRT

sphx_glr_tutorials__rendered_examples_distributed_inference_tensor_parallel_simple_example_mn.py

Tensor Parallel Distributed Inference with Torch-TensorRT (torchrun)

sphx_glr_tutorials__rendered_examples_distributed_inference_test_multinode_export_save_load.py

Two-node test: torch.export → TRT AOT compile → save → load → inference.

Gallery generated by Sphinx-Gallery