Rate this Page

Draft Export#

Created On: May 07, 2025 | Last Updated: May 07, 2025

Warning

This feature is not meant to be used in production and is designed to be used as a tool for debugging torch.export tracing errors.

Draft-export is a new version of export, which is designed to consistently produce a graph, even if there are potential soundness issues, and to generate a report listing out all of the issues export encountered during tracing and providing additional debugging information. For custom operators that don’t have fake kernels, it will also generate a profile which you can register to automatically generate a fake kernel.

Have you ever tried to export a model using torch.export.export(), only to encounter a data-dependent issue? You fix it, but then run into a missing fake kernel problem. And after resolving that, you get hit with another data-dependent issue. You wonder to yourself, I wish there was a way I could just get a graph to play around with, and be able to view all the issues in one place so that I can fix them later…

draft_export to the rescue!

draft_export is a version of export which will always successfully export a graph, even if there are potential soundness issues. These issues will then be compiled into a report for clearer visualization, which can be fixed later on.

What sort of errors does it catch?#

Draft-export helps to catch and debug the following errors:

  • Guard on data-dependent errors

  • Constraint violation errors

  • Missing fake kernels

  • Incorrectly written fake kernels

How does it work?#

In normal export, we will convert the sample inputs into FakeTensors and use them to record operations and trace the program into a graph. Input tensor shapes that can change (which are marked through dynamic_shapes), or values within tensors (typically from an .item() call) will be represented as a symbolic shape (SymInt) instead of a concrete integer. However some issues may occur while tracing - we may run into guards that we cannot evaluate, like if we want to check if some item in a tensor is greater than 0 (u0 >= 0). Since the tracer doesn’t know anything about the value of u0, it will throw a data-dependent error. If the model uses a custom operator but a fake kernel hasn’t been defined for it, then we will error with fake_tensor.UnsupportedOperatorException because export doesn’t know how to apply this on FakeTensors. If a custom operator has a fake kernel implemented incorrectly, export will silently produce an incorrect graph that doesn’t match the eager behavior.

To fix the above errors, draft-export uses real tensor tracing to guide us on how to proceed when tracing. As we trace the model with fake tensors, for every operation that happens on a fake tensor, draft-export will also run the operator on stored real tensors which come from the example inputs passed to export. This allows us to address the above errors: When we reach a guard that we cannot evaluate, like u0 >= 0, we will use the stored real tensor values to evaluate this guard. Runtime asserts will be added into the graph to ensure that the graph asserts the same guard that we assumed while tracing. If we run into a custom operator without a fake kernel, we will run the operator’s normal kernel with the stored real tensors, and return a fake tensor with the same rank but unbacked shapes. Since we have the real tensor output for every operation, we will compare this with the fake tensor output from the fake kernel. If the fake kernel is implemented incorrectly, we will then catch this behavior and generate a more correct fake kernel.

How can I use draft export?#

Let’s say you’re trying to export this piece of code:

class M(torch.nn.Module):
    def forward(self, x, y, z):
        res = torch.ops.mylib.foo2(x, y)

        a = res.item()
        a = -a
        a = a // 3
        a = a + 5

        z = torch.cat([z, z])

        torch._check_is_size(a)
        torch._check(a < z.shape[0])

        return z[:a]

inp = (torch.tensor(3), torch.tensor(4), torch.ones(3, 3))

ep = torch.export.export(M(), inp)

This runs into a “missing fake kernel” error for mylib.foo2 and then a GuardOnDataDependentExpression because of the slicing of z with a, an unbacked symint.

To call draft-export, we can replace the torch.export line with the following:

from torch.export._draft_export import draft_export
ep = draft_export(M(), inp)

ep is a valid ExportedProgram which can now be passed through further environments!

Debugging with draft-export#

In the terminal output from draft-export, you should see the following message:

#########################################################################################
WARNING: 2 issue(s) found during export, and it was not able to soundly produce a graph.
To view the report of failures in an html page, please run the command:
    `tlparse /tmp/export_angelayi/dedicated_log_torch_trace_axpofwe2.log --export`
Or, you can view the errors in python by inspecting `print(ep._report)`.
########################################################################################

Draft-export automatically dumps logs for tlparse. You can view the tracing errors by using print(ep._report), or you can pass the logs into tlparse to generate an html report.

Running the tlparse command in the terminal will generate a tlparse HTML report. Here is an example of the tlparse report:

_images/draft_export_report.png

Clicking into the Data Dependent Error, we will see the following page which contains information to help debug this error. Specifically, it contains:

  • The stacktrace at which this error occurs

  • A list of local variables and their shapes

  • Information for how this guard was created

_images/draft_export_report_dde.png

The returned Exported Program#

Because draft-export specializes on code paths based on the example inputs, the exported program resulting from draft-export is guaranteed to be runnable and return correct results for at least the given example inputs. Other inputs can work, as long as they match the same guards that were taken when we were draft-exporting.

For example, if we have a graph branching on if a value is greater than 5, if in draft-export our example inputs were greater than 5, then the returned ExportedProgram will specialize on that branch, and will assert that the value is greater than 5. This means that the program will succeed if you pass in another value greater than 5, but will fail if you pass in a value less than 5. This is more sound than torch.jit.trace, which will silently specialize on the branch. The proper way for torch.export to support both branches would be to rewrite the code using torch.cond, which will then capture both branches.

Because of the runtime assertions in the graph, the returned exported-program is also retraceable with torch.export or torch.compile, with a minor addition in the case where a custom operator is missing a fake kernel.

Generating Fake Kernels#

If a custom operator does not contain a fake implementation, currently draft-export will use the real-tensor propagation to get an output for the operator and continue tracing. However, if we run the exported program with fake tensors or retrace the exported model, we will still fail because there is still no fake kernel implementation.

To address this, after draft-export, we will generate an operator profile for each custom operator call that we encounter, and store this on the report attached to the exported program: ep._report.op_profiles. Users can then use the context manager torch._library.fake_profile.unsafe_generate_fake_kernels to generate and register a fake implementation based on these operator profiles. This way future fake tensor retracing will work.

The workflow would look something like:

class M(torch.nn.Module):
    def forward(self, a, b):
        res = torch.ops.mylib.foo(a, b)  # no fake impl
        return res

ep = draft_export(M(), (torch.ones(3, 4), torch.ones(3, 4)))

with torch._library.fake_profile.unsafe_generate_fake_kernels(ep._report.op_profiles):
    decomp = ep.run_decompositions()

new_inp = (
    torch.ones(2, 3, 4),
    torch.ones(2, 3, 4),
)

# Save the profile to a yaml and check it into a codebase
save_op_profiles(ep._report.op_profiles, "op_profile.yaml")
# Load the yaml
loaded_op_profile = load_op_profiles("op_profile.yaml")

The operator profile is a dictionary mapping operator name to a set of profiles which describe the input and outputs of the operator, and could be manually written, saved into a yaml file, and checked into a codebase. Here’s an example of a profile for mylib.foo.default:

"mylib.foo.default": {
    OpProfile(
        args_profile=(
            TensorMetadata(
                rank=2,
                dtype=torch.float32,
                device=torch.device("cpu"),
                layout=torch.strided,
            ),
            TensorMetadata(
                rank=2,
                dtype=torch.float32,
                device=torch.device("cpu"),
                layout=torch.strided,
            ),
        ),
        out_profile=TensorMetadata(
            rank=2,
            dtype=torch.float32,
            device=torch.device("cpu"),
            layout=torch.strided,
        ),
    )
}

mylib.foo.default’s profile contains only one profile, which says that for 2 input tensors of rank 2, dtype torch.float32, device cpu, we will return one tensor of rank 2, dtype torch.float32, and device cpu. Using the context manager, will then generate a fake kernel where given 2 input tensors of rank 2 (and the other tensor metadata), we will output one tensor of rank 2 (and the other tensor metadata).

If the operator also supports other input ranks, then we can add the profile to this list of profiles, either by manually adding it into the existing profile or rerunning draft-export with new inputs to get new profiles, so that the generated fake kernel will support more input types. Otherwise it will error.

Where to go from here?#

Now that we have successfully created an ExportedProgram using draft-export, we can use further compilers such as AOTInductor to optimize its performance and produce a runnable artifact. This optimized version can then be used for deployment. In parallel, we can utilize the report generated by draft-export to identify and fix torch.export errors that were encountered so that the original model can be directly traceable with torch.export.