.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "tutorials/_rendered_examples/dynamo/dynamic_memory_allocation.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_tutorials__rendered_examples_dynamo_dynamic_memory_allocation.py: .. _dynamic_memory_allocation: Dynamic Memory Allocation ========================================================== This script demonstrates how to use dynamic memory allocation with Torch-TensorRT to reduce GPU memory footprint. When enabled, TensorRT engines allocate and deallocate resources dynamically during inference, which can significantly reduce peak memory usage. This is particularly useful when: - Running multiple models on the same GPU - Working with limited GPU memory - Memory usage needs to be minimized between inference calls .. GENERATED FROM PYTHON SOURCE LINES 19-21 Imports and Model Definition ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. GENERATED FROM PYTHON SOURCE LINES 21-34 .. code-block:: python import gc import time import numpy as np import torch import torch_tensorrt as torch_trt import torchvision.models as models np.random.seed(5) torch.manual_seed(5) inputs = [torch.rand((100, 3, 224, 224)).to("cuda")] .. GENERATED FROM PYTHON SOURCE LINES 35-46 Compilation Settings with Dynamic Memory Allocation ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Key settings for dynamic memory allocation: - ``dynamically_allocate_resources=True``: Enables dynamic resource allocation - ``lazy_engine_init=True``: Delays engine initialization until first inference - ``immutable_weights=False``: Allows weight refitting if needed With these settings, the engine will allocate GPU memory only when needed and deallocate it after inference completes. .. GENERATED FROM PYTHON SOURCE LINES 46-61 .. code-block:: python settings = { "ir": "dynamo", "use_python_runtime": False, "enabled_precisions": {torch.float32}, "immutable_weights": False, "lazy_engine_init": True, "dynamically_allocate_resources": True, } model = models.resnet152(pretrained=True).eval().to("cuda") compiled_module = torch_trt.compile(model, inputs=inputs, **settings) print((torch.cuda.mem_get_info()[1] - torch.cuda.mem_get_info()[0]) / 1024**3) compiled_module(*inputs) .. GENERATED FROM PYTHON SOURCE LINES 62-75 Runtime Resource Allocation Control ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You can control resource allocation behavior at runtime using the ``ResourceAllocationStrategy`` context manager. This allows you to: - Switch between dynamic and static allocation modes - Control when resources are allocated and deallocated - Optimize memory usage for specific inference patterns In this example, we temporarily disable dynamic allocation to keep resources allocated between inference calls, which can improve performance when running multiple consecutive inferences. .. GENERATED FROM PYTHON SOURCE LINES 75-94 .. code-block:: python time.sleep(30) with torch_trt.dynamo.runtime.ResourceAllocationStrategy( compiled_module, dynamically_allocate_resources=False ): print( "Memory used (GB):", (torch.cuda.mem_get_info()[1] - torch.cuda.mem_get_info()[0]) / 1024**3, ) compiled_module(*inputs) gc.collect() torch.cuda.empty_cache() time.sleep(30) print( "Memory used (GB):", (torch.cuda.mem_get_info()[1] - torch.cuda.mem_get_info()[0]) / 1024**3, ) compiled_module(*inputs) .. GENERATED FROM PYTHON SOURCE LINES 95-110 Memory Usage Comparison ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Dynamic memory allocation trades off some performance for reduced memory footprint: **Benefits:** - Lower peak GPU memory usage - Reduced memory pressure on shared GPUs **Considerations:** - Slight overhead from allocation/deallocation - Best suited for scenarios where memory is constrained - May not be necessary for single-model deployments with ample memory .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.000 seconds) .. _sphx_glr_download_tutorials__rendered_examples_dynamo_dynamic_memory_allocation.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: dynamic_memory_allocation.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: dynamic_memory_allocation.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_