{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n\n# Dynamic Memory Allocation\n\nThis script demonstrates how to use dynamic memory allocation with Torch-TensorRT\nto reduce GPU memory footprint. When enabled, TensorRT engines allocate and deallocate resources\ndynamically during inference, which can significantly reduce peak memory usage.\n\nThis is particularly useful when:\n\n- Running multiple models on the same GPU\n- Working with limited GPU memory\n- Memory usage needs to be minimized between inference calls\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Imports and Model Definition\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import gc\nimport time\n\nimport numpy as np\nimport torch\nimport torch_tensorrt as torch_trt\nimport torchvision.models as models\n\nnp.random.seed(5)\ntorch.manual_seed(5)\ninputs = [torch.rand((100, 3, 224, 224)).to(\"cuda\")]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Compilation Settings with Dynamic Memory Allocation\n\nKey settings for dynamic memory allocation:\n\n- ``dynamically_allocate_resources=True``: Enables dynamic resource allocation\n- ``lazy_engine_init=True``: Delays engine initialization until first inference\n- ``immutable_weights=False``: Allows weight refitting if needed\n\nWith these settings, the engine will allocate GPU memory only when needed\nand deallocate it after inference completes.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "settings = {\n    \"ir\": \"dynamo\",\n    \"use_python_runtime\": False,\n    \"enabled_precisions\": {torch.float32},\n    \"immutable_weights\": False,\n    \"lazy_engine_init\": True,\n    \"dynamically_allocate_resources\": True,\n}\n\nmodel = models.resnet152(pretrained=True).eval().to(\"cuda\")\ncompiled_module = torch_trt.compile(model, inputs=inputs, **settings)\nprint((torch.cuda.mem_get_info()[1] - torch.cuda.mem_get_info()[0]) / 1024**3)\ncompiled_module(*inputs)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Runtime Resource Allocation Control\n\nYou can control resource allocation behavior at runtime using the\n``ResourceAllocationStrategy`` context manager. This allows you to:\n\n- Switch between dynamic and static allocation modes\n- Control when resources are allocated and deallocated\n- Optimize memory usage for specific inference patterns\n\nIn this example, we temporarily disable dynamic allocation to keep\nresources allocated between inference calls, which can improve performance\nwhen running multiple consecutive inferences.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "time.sleep(30)\nwith torch_trt.dynamo.runtime.ResourceAllocationStrategy(\n    compiled_module, dynamically_allocate_resources=False\n):\n    print(\n        \"Memory used (GB):\",\n        (torch.cuda.mem_get_info()[1] - torch.cuda.mem_get_info()[0]) / 1024**3,\n    )\n    compiled_module(*inputs)\n    gc.collect()\n    torch.cuda.empty_cache()\n    time.sleep(30)\n    print(\n        \"Memory used (GB):\",\n        (torch.cuda.mem_get_info()[1] - torch.cuda.mem_get_info()[0]) / 1024**3,\n    )\n    compiled_module(*inputs)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Memory Usage Comparison\n\nDynamic memory allocation trades off some performance for reduced memory footprint:\n\n**Benefits:**\n\n- Lower peak GPU memory usage\n- Reduced memory pressure on shared GPUs\n\n**Considerations:**\n\n- Slight overhead from allocation/deallocation\n- Best suited for scenarios where memory is constrained\n- May not be necessary for single-model deployments with ample memory\n\n"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.11.15"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}