.. _cross_compile_windows: Cross-Compiling for Windows ============================ :func:`torch_tensorrt.dynamo.cross_compile_for_windows` compiles TRT engines on a **Linux x86-64** host and produces an ``ExportedProgram`` containing engines that can be loaded and executed on **Windows x86-64** — without requiring a Linux GPU at inference time. This is the standard path for teams that build models on Linux (where TRT tooling is more mature) and deploy on Windows (game engines, desktop applications, enterprise software). ---- Requirements ------------ * **Build machine**: Linux x86-64 with CUDA and TensorRT installed. * **Target machine**: Windows x86-64 with a compatible NVIDIA GPU (same or newer CUDA compute capability). * ``enable_cross_compile_for_windows=True`` is automatically set by this API; do not set it manually on ``compile()``. The following features are **disabled** during cross-compilation (they are not available in the Windows TRT runtime or require OS-specific binaries): * Python runtime (``use_python_runtime`` is forced to ``False``) * Lazy engine initialization (``lazy_engine_init`` is forced to ``False``) * Engine caching (``cache_built_engines`` / ``reuse_cached_engines`` disabled) ---- Workflow -------- **Step 1 — Export on the Linux build machine** .. code-block:: python import torch import torch_tensorrt model = MyModel().eval().cuda() inputs = [torch.randn(1, 3, 224, 224).cuda()] # Export to ExportedProgram exp_program = torch.export.export(model, tuple(inputs)) **Step 2 — Cross-compile for Windows** .. code-block:: python trt_gm = torch_tensorrt.dynamo.cross_compile_for_windows( exp_program, arg_inputs=inputs, use_explicit_typing=True, # enabled_precisions deprecated; cast model/inputs to target dtype ) **Step 3 — Save the compiled module** .. code-block:: python torch_tensorrt.save(trt_gm, "model_windows.ep", arg_inputs=inputs) **Step 4 — Load and run on Windows** Copy ``model_windows.ep`` to the Windows machine. Ensure ``libtorchtrt_runtime.so`` / ``torchtrt_runtime.dll`` is on the library path. .. code-block:: python # On Windows: import torch_tensorrt trt_gm = torch_tensorrt.load("model_windows.ep").module() output = trt_gm(*inputs) ---- Dynamic Shapes -------------- Dynamic shapes work the same as in normal ``compile()``: .. code-block:: python from torch_tensorrt import Input trt_gm = torch_tensorrt.dynamo.cross_compile_for_windows( exp_program, arg_inputs=[ Input( min_shape=(1, 3, 224, 224), opt_shape=(4, 3, 224, 224), max_shape=(16, 3, 224, 224), ) ], ) ---- Engine Compatibility --------------------- The produced engines are compatible with the **same or newer CUDA compute capability** as the GPU used during compilation. Use ``hardware_compatible=True`` if the Windows deployment GPU may have a different architecture within the Ampere+ generation: .. code-block:: python trt_gm = torch_tensorrt.dynamo.cross_compile_for_windows( exp_program, arg_inputs=inputs, hardware_compatible=True, # engine runs on Ampere and newer ) ---- Saving and Loading Cross-Compiled Programs -------------------------------------------- The output of ``cross_compile_for_windows`` is a standard ``torch.fx.GraphModule`` containing ``TorchTensorRTModule`` submodules with Windows-compatible engine bytes. Save and load via the standard Torch-TensorRT save/load API: .. code-block:: python # Save (Linux) torch_tensorrt.save(trt_gm, "model_windows.ep", arg_inputs=inputs) # Load (Windows) trt_gm = torch_tensorrt.load("model_windows.ep").module() trt_gm(*inputs) Alternatively, save as a raw ``.engine`` file for direct TRT deployment: .. code-block:: python engine_bytes = torch_tensorrt.dynamo.convert_exported_program_to_serialized_trt_engine( exp_program, arg_inputs=inputs, enable_cross_compile_for_windows=False, # use cross_compile_for_windows() instead ) # Note: use cross_compile_for_windows() for the full workflow; # convert_exported_program_to_serialized_trt_engine() does not support cross-compilation. ---- Troubleshooting --------------- ``AssertionError: cross_compile_for_windows is only supported on Linux x86-64`` You must run the compilation step on a Linux x86-64 machine. The ``@needs_cross_compile`` decorator gates this function. Engine fails to load on Windows Ensure the TRT version on Windows is ≥ the version used on Linux. Use ``version_compatible=True`` for forward compatibility within a TRT major version. Output mismatch between Linux and Windows Floating-point results may differ slightly due to different driver/hardware implementations. Use ``optimization_level=0`` on Linux to minimize kernel specialization and improve cross-platform reproducibility.