mergedog: shepherding approved PRs into pytorch/pytorch

Edward Yang (@ezyang) · May 3, 2026 · 5 min read
citoolingllmmergedogpytorchbot

Disclosure. This post was drafted by Claude (Anthropic’s coding assistant) with editing from ezyang.

mergedog is an entirely vibe-coded small Python harness that takes one approved pytorch/pytorch PR and shepherds it through CI to the point a human can comment @pytorchbot merge. The idea is to use LLMs to deal with some aspects of the drudgery of landing PRs from external contributors:

  • Pressing the “Approve CI workflows” button (in a secure way!),
  • Waiting for the CI results to come back,
  • Checking if the CI failures are spurious or real, and
  • Fixing simple CI failures that are just due to brain-os.

While each of these tasks is individually not onerous, they take up time in aggregate; and it is especially annoying to have to remember to come back in a few hours to actually look at CI results.

The way it works is you spin up a persistent process for every PR you want to shepherd. It continuously polls for results from GitHub, launching a claude instance if some intelligent intervention is needed.

When designing mergedog, I had three aims:

  • Empowerment. Once we acknowledge a PR as good, we take full control, pushing update commits as necessary. The original author does not need to be involved in the rest of the process.
  • Security. In fact, for security reasons, it is better for the author not to be involved. We design a clear trust boundary, where LLMs only touch code that a human has read.
  • Autonomy. Ask for a PR to be merged, then don’t touch it until it’s done.

Here’s the main loop of mergedog, when you ask it to run on a PR:

  1. Verify that the top commit is trusted (by explicit approval, or because mergedog authored it.)
  2. Approve any action_required workflow runs (external-author PRs start gated until a maintainer presses “Approve and run”).
  3. Read CI status by enumerating all workflow runs for the PR head.
  4. Failed → invoke Claude in “fix-CI” mode. Claude either makes one [MERGEDOG]-prefixed commit fixing the failures, or makes no commit because the failures look spurious.
  5. Passed → if the merge-base with main is older than a week, merge main in (handing conflicts to Claude). Then apply ciflow/trunk and keep polling.
  6. Trunk green → post a “mergedog handoff” comment summarizing every Claude session and wait for @pytorchbot merge.

mergedog is idempotent: you can kill it and relaunch it, and it will work out where a PR is from context and continue the main loop from there.

Examples

I landed a bunch of PRs with mergedog while developing it, using them to refine the workflow (this is one of the really good use cases for vibecoding: you can get straight to refining the UX until something feels good.) Here are two representative PRs that were merged:

#173321chani0343, C++ scientific-notation API

This one was failing with seven CI failures. That’s a lot of failures. Claude took a look:

  1. Three CUDA13.0 build failures: All failed during docker image build with Connection timed out errors connecting to ppa.launchpadcontent.net:443. Network/infra issue, never reached compilation.
  2. linux-jammy-py3.10-gcc11 / test (distributed, ...): test_hf_bert_ddp_aot_eager killed with Command took >30min.
  3. Three dynamo_wrapped failures: All hit the same test_invalid_types failure. Dr. CI explicitly marks these as FLAKY with similar failures on trunk.

None of these touch the modified files. Choosing option 2: no commit.

I looked over the judgment and agreed that it seemed reasonable, and so I force merged it.

#173183JonSnow1807, expose rearrange in torch.func

After approving the CI run, eventually it failed with a Python 3.14 doctest crash and a PYFMT lint complaint. Without my having to be at my laptop, Claude spent ~4 minutes reading CI annotations and source files, then made a two-file fix:

Committed two-part fix: (1) functorch/einops/rearrange.py uses an explicit namespace dict for exec() instead of relying on locals() (which broke under Python 3.14 / PEP 667 with KeyError: 'do_rearrange' in the new doctest), and (2) torch/func/__init__.py collapses the rearrange import into the existing torch._functorch.* block to satisfy PYFMT/usort.

We went ahead and pushed it directly, avoiding the need to do another days-long round trip with the original contributor. This was enough to then merge it.

Security model

The security model for mergedog is something I want to take some time describing, because running LLMs on external user code is a dangerous process, and the output products of the LLM will get directly executed on CI infrastructure (as mergedog will autonomously push changes to trigger CI on them).

The main security premise revolves around approvals on a per-commit basis. Specifically, when a maintainer approves a PR, they are putting the following things in the trusted set: (1) the exact tree state of the PR (nothing about the history, just what it’s current state is, and (2) all visible text on the pull request page, e.g., the description and comments. Anything that comes in afterwards that doesn’t come from a trusted source is not trusted and will halt the harness. The harness then ensures only trusted content makes its way to the agent: we squash the history so only a single commit can be seen in the working tree, and we explicitly construct a sidecar file that contains the pull request description and comments (with some extra processing to remove things that are easier for a reviewer to miss, like the contents of a details tag).

This harness design also means that claude code can run in a sandboxed environment, which is nice because my employer forces this policy on the corp laptop.

How to use it

$ python -m mergedog 173321

PR URL also works. Useful flags:

  • --max-base-age DAYS (default 7) — when to merge main in first.
  • --accept-divergence — proceed even if the head moved past the approval (only after re-reviewing the new commits).
  • --ignore-sev — don’t park on open ci: sev issues.

State lives at ~/.mergedog/: per-PR logs in logs/<pr>.log, trust state in state/<pr>.json, worktree in worktrees/<pr>/. Ctrl-C anywhere; restart with the same command and it resumes.

A mergedog.mux TUI runs many shepherds in parallel:

$ python -m mergedog.mux

This is what I use, since I typically end up with a lot of mergedogs going in parallel.