Skip to main content
Ctrl+K

Torch-TensorRT

  • Installation
  • User Guide
  • Advanced Usage
  • Model Zoo
  • API Reference
    • Debugging
    • Contributing
    • Legacy Frontends
  • GitHub
  • Installation
  • User Guide
  • Advanced Usage
  • Model Zoo
  • API Reference
  • Debugging
  • Contributing
  • Legacy Frontends
  • GitHub

Section Navigation

  • HuggingFace Models
    • Compiling LLM models from Huggingface
    • Example: Compiling Stable Diffusion with torch.compile
    • Example: Compiling FLUX.1-dev with the dynamo backend
    • Example: Mutable Torch TensorRT Module
  • Extensibility
    • Converters
      • Writing Dynamo Converters
      • Converter Registry Internals
      • The impl/ Building-Block Library
      • Example: Auto-generate a Converter for a Custom Kernel
      • Example: Overloading Converters with Custom Converters
    • Lowering Passes
      • Writing Dynamo ATen Lowering Passes
    • Plugins
      • Plugin System
      • Example: Auto-generate a Plugin for a Custom Kernel
      • Example: Using Custom Kernels within TensorRT Engines
      • Automatically Generate a TensorRT AOT Plugin
      • Example: Custom Kernels with NVRTC in TensorRT AOT Plugins
  • Resource & Memory Management
    • Resource Management
    • Engine Caching
    • Example: Engine Caching
    • Example: Engine Caching (BERT)
    • Example: Weight Streaming
    • Example: Dynamic Memory Allocation
    • Example: Low CPU Memory Compilation
  • Compilation & Graph Analysis
    • Tracing Models with torch_tensorrt.dynamo.trace
    • Dryrun Mode
    • Example: Hierarchical Partitioner
  • Weight Refitting & LoRA
    • Refitting TensorRT Engines with Updated Weights
    • Example: Refitting Programs with New Weights
  • Runtime Optimization
    • CUDAGraphs and the Output Allocator
    • Example: Torch Export with Cudagraphs
    • Example: Pre-allocated output buffer
    • Python Runtime
  • Deployment
    • Serving a Torch-TensorRT model with Triton
    • Cross-Compiling for Windows
    • Example: Cross-runtime Compilation for Windows
    • Distributed Inference
    • Complex Tensor Support
  • Example: Distributed Inference
    • Torch-TensorRT Distributed Inference
    • Torch-TensorRT Distributed Inference
    • Tensor Parallel Distributed Inference with Torch-TensorRT
  • Operators Supported
  • Advanced Usage
  • Weight...

Weight Refitting & LoRA#

Update compiled TensorRT engine weights without recompilation — for LoRA adapters, fine-tuned checkpoints, and EMA weight updates.

  • Refitting TensorRT Engines with Updated Weights
  • Example: Refitting Programs with New Weights

previous

Hierarchical Partitioner Example

next

Refitting TensorRT Engines with Updated Weights

Edit on GitHub
Show Source

© Copyright 2024, NVIDIA Corporation.