{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n\n# Dynamic Memory Allocation\n\nThis script demonstrates how to use dynamic memory allocation with Torch-TensorRT\nto reduce GPU memory footprint. When enabled, TensorRT engines allocate and deallocate resources\ndynamically during inference, which can significantly reduce peak memory usage.\n\nThis is particularly useful when:\n\n- Running multiple models on the same GPU\n- Working with limited GPU memory\n- Memory usage needs to be minimized between inference calls\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Imports and Model Definition\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import gc\nimport time\n\nimport numpy as np\nimport torch\nimport torch_tensorrt as torch_trt\nimport torchvision.models as models\n\nnp.random.seed(5)\ntorch.manual_seed(5)\ninputs = [torch.rand((100, 3, 224, 224)).to(\"cuda\")]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compilation Settings with Dynamic Memory Allocation\n\nKey settings for dynamic memory allocation:\n\n- ``dynamically_allocate_resources=True``: Enables dynamic resource allocation\n- ``lazy_engine_init=True``: Delays engine initialization until first inference\n- ``immutable_weights=False``: Allows weight refitting if needed\n\nWith these settings, the engine will allocate GPU memory only when needed\nand deallocate it after inference completes.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "settings = {\n \"ir\": \"dynamo\",\n \"use_python_runtime\": False,\n \"enabled_precisions\": {torch.float32},\n \"immutable_weights\": False,\n \"lazy_engine_init\": True,\n \"dynamically_allocate_resources\": True,\n}\n\nmodel = models.resnet152(pretrained=True).eval().to(\"cuda\")\ncompiled_module = torch_trt.compile(model, inputs=inputs, **settings)\nprint((torch.cuda.mem_get_info()[1] - torch.cuda.mem_get_info()[0]) / 1024**3)\ncompiled_module(*inputs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Runtime Resource Allocation Control\n\nYou can control resource allocation behavior at runtime using the\n``ResourceAllocationStrategy`` context manager. This allows you to:\n\n- Switch between dynamic and static allocation modes\n- Control when resources are allocated and deallocated\n- Optimize memory usage for specific inference patterns\n\nIn this example, we temporarily disable dynamic allocation to keep\nresources allocated between inference calls, which can improve performance\nwhen running multiple consecutive inferences.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "time.sleep(30)\nwith torch_trt.dynamo.runtime.ResourceAllocationStrategy(\n compiled_module, dynamically_allocate_resources=False\n):\n print(\n \"Memory used (GB):\",\n (torch.cuda.mem_get_info()[1] - torch.cuda.mem_get_info()[0]) / 1024**3,\n )\n compiled_module(*inputs)\n gc.collect()\n torch.cuda.empty_cache()\n time.sleep(30)\n print(\n \"Memory used (GB):\",\n (torch.cuda.mem_get_info()[1] - torch.cuda.mem_get_info()[0]) / 1024**3,\n )\n compiled_module(*inputs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Memory Usage Comparison\n\nDynamic memory allocation trades off some performance for reduced memory footprint:\n\n**Benefits:**\n\n- Lower peak GPU memory usage\n- Reduced memory pressure on shared GPUs\n\n**Considerations:**\n\n- Slight overhead from allocation/deallocation\n- Best suited for scenarios where memory is constrained\n- May not be necessary for single-model deployments with ample memory\n\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.15" } }, "nbformat": 4, "nbformat_minor": 0 }