Runtime Optimization#
Optimize inference throughput and latency: CUDA Graphs for kernel-replay, pre-allocated output buffers, and choosing the Python vs C++ TRT execution path.
Optimize inference throughput and latency: CUDA Graphs for kernel-replay, pre-allocated output buffers, and choosing the Python vs C++ TRT execution path.