[unitaryhack] Topology-aware initial placement for the qubit-mapping pass#4678
[unitaryhack] Topology-aware initial placement for the qubit-mapping pass#4678border-b wants to merge 7 commits into
Conversation
|
Nice repro. Two isomorphic stars routing to different SWAP counts is a clean way to surface the physical-numbering artifact, and scoring the greedy layout against identity and keeping whichever wins is a sensible guard against regressions. The thing worth raising is that the placement and the selection are optimising slightly different models. Two smaller things. The selection runs on a static distance proxy, which you flag honestly, and the stronger if costlier option is to route both candidate layouts and compare real SWAP counts. And the linear recency weighting reads as a little arbitrary without a sentence tying it to the router's own decay behaviour. Solid work overall, and the regression-guard instinct is the right one. |
There was a problem hiding this comment.
Hi @border-b, thank you for your contribution. It is a great start but will need some degree of rethinking to be ready for merge. As noted by @Zaneham in his comment the current placement, layout and router are optimizing different models. I also think we are missing larger demos/benchmarks on some quantum volume like circuits to understand that the PR is having the intended improvement.
The interaction placement builds the candidate from the total two-qubit interaction counts. This treats all interactions equally indepedent of when they happen in the circuit. Later on you use the layoutScore to choose between greedy and identity with a distance score. I don't think you should be silently falling back to identity if a different placement strategy was selected.
However, the bigger issue is that chooseSwap then does it's selection based on the dynamic front layer. This will invalidate most of the static assumptions and might lead to a situation like:
- Circuit has many late interactions between
q0and `q1 - Greedy placement puts
q0andq1close because of this. - The first front-layer gates use
q2-q3andq3-q4. - The router adds immediate swaps
- Swaps have not changed the placement so that original static score is invalidated
The issue is that these models are not working together. The first layout could optimize for pairs that matter often but not until late in the circuit. The second model could choose a layout based on early distances without understand the routing. Then the third model could invalidate all of these static assumptions by inserting early swaps.
I think a good direction would be to not make greedy the whole placement system and instead structure this as a small (internal) layout/routing system to which we could add the greedy candidate now and maybe SABRE/LightSABRE style candidates in the future (although one of these as an alternative to greedy is acceptable and preferred).
An ideal approach might look something like:
- Build a
RoutingProblemfrom the IR once. Capture the device, routeable operations, source wires, measurement constraints, and virtual-qubit mapping. - Generate starting
LayoutCandidates. Keep this as a small helper layer. Identity, greedy, dense, and random layouts should only propose starting layouts. Nowgreedyis just a potential placement seed to kickoff the search. - Add a
RoutingSearchStrategywhich should own the search. It should route candidates in analyze mode and then run the SABRE forward/backward refinement. It's output should be the final routing selected for the router to apply. - Select the best
RoutingResult. Choose by the routed SWAP count. Longer term we might want to make this selectable (eg.,swap-count,depth, etc.) or at the choice of the strategy. - Emit the selected result once through a
RoutingEmitter. Rewrite the IR only after the best routed result has been selected.
I would prefer if the the reverse traversal strategy was used from the original paper as opposed to the greedy solution. In this way something like a LightSABRE extension would just add a new strategy (evaluating many candidate layouts) in the future and not a new pass rewrite.
Please feel free to ask me any clarifying questions you might have 😄
|
Thanks @taalexander and @Zaneham for reviewing and the thoughtful comments. The point about the placement, selection, and router optimizing different models is well taken. I'm going back through the SABRE paper and the proposed restructuring now, with reverse traversal as the likely direction (I'd considered it early on but went with the smaller change at the time). I'll follow up with some questions in the next couple of days. |
|
Thank you @border-b, looking forward to seeing the new and improved version! |
|
@taalexander The plan below is essentially the points from your review with the details filled in. I'm writing it out mostly to verify we're on the same page before I start, plus a few questions where I'd rather ask than guess.
From the current PR, Questions
|
|
Hi @border-b. Yes, your understanding of my proposal is correct. For your questions.
Yes deterministic seeds is great for now. A follow-up PR with lightSABRE like layout exploration would be great but doesn't need to go into this.
Yes, the initial placement maybe would have an option to be fixed., eg., the identity or something else but the search by default.
These may or may not exist. If the current pass doesn't use these then do not include them.
Yes, this seems good. A plot with a script attached sweeping circuit optimization metrics (circuit depth/gates) and wall-clock compilation (ideally for the passes before/after). To ensure circuit mapping is improved with limited wall-clock cost. |
Signed-off-by: Seemanta Bhattacharjee <babune99@gmail.com>
|
@taalexander reworked along the structure you proposed, now selecting the layout by routed swap count. The QV benchmark you asked for is in the description, swap reduction next to the qubit-mapping pass wall-clock, with the plot attached. Happy to iterate. |
|
Thanks for your changes @border-b. I am reviewing now! Just FYI the hackathon closes June 17th so let's work hard to finish this up and get it to a state where it can be merged 🚀 |
|
Hi @taalexander, I've updated the description with some new benchmarks. I added a Qiskit SABRE comparison and two-qubit depth alongside the swap counts. |
|
I ran this on a large VQE benchmark on a square lattice and received some very nice results thanks to the improved layout/routing implementing the proper SABRE algorithm:
|
taalexander
left a comment
There was a problem hiding this comment.
Phase 2 review. A few more minor comments. Can you please make sure you that all new file copyrights are for 2026 only.
Would appreciate a quick turnaround on this so we can get merged 🚀
|
Hi @taalexander, thank you for the thorough and helpful reviews! I've been working on them and you can expect the changes pushed in the next hour or so. Sharing a couple of questions under the specific comments. |
Signed-off-by: Seemanta Bhattacharjee <babune99@gmail.com>
Signed-off-by: Seemanta Bhattacharjee <babune99@gmail.com>
|
Hi @taalexander, I pushed the follow-up refactor addressing the current review threads. Sorry this took longer than I initially expected. 😅 I verified the updated mapping tests pass, and the benchmarks match. I’ll be on standby for the next few hours to iterate if needed. Otherwise, I’ll follow up early tomorrow. Thanks again for the detailed reviews, they’ve been really helpful! |
|
Thanks @border-b. Would you mind commenting how you addressed them and resolving the comments as necessary if you believe that you have :) |
taalexander
left a comment
There was a problem hiding this comment.
Thank you very much for your changes. We're almost there! A few more comments. If you address these I will get it merged in the next day or two 🚀 and you will be eligible for the bounty still (I've sourced a bit more time). Can you please give me push access to your branch just in case?
|
Thanks @taalexander. Maintainer edits are already enabled on this branch. I’ll revisit this tomorrow morning and address the remaining comments! |
Signed-off-by: Seemanta Bhattacharjee <babune99@gmail.com>
Command Bot: Processing... |
|
Thank you for addressing my comments @border-b, the PR is in very good shape now with one minor issue and some CI work to get it passing that I can guide along. I have marked this as It has been a pleasure working with you on this. This came together as a very nice improvement for the project |
Signed-off-by: Thomas Alexander <talexander@nvidia.com>
Signed-off-by: Thomas Alexander <talexander@nvidia.com>
Command Bot: Processing... |
taalexander
left a comment
There was a problem hiding this comment.
LGTM, thank you very much for your unitaryHACK contribution @border-b. You went above and beyond.
Signed-off-by: Thomas Alexander <talexander@nvidia.com>
Command Bot: Processing... |
CI Summary (
|
| Job | Result | Link |
|---|---|---|
build_and_test |
❌ failure | view |
Top-level jobs (13)
| Job | Result |
|---|---|
binaries |
⏩ skipped |
build_and_test |
❌ failure |
config_devdeps |
✅ success |
config_source_build |
⏩ skipped |
config_wheeldeps |
✅ success |
devdeps |
✅ success |
docker_image |
⏩ skipped |
gen_code_coverage |
⏩ skipped |
metadata |
✅ success |
python_metapackages |
⏩ skipped |
python_wheels |
⏩ skipped |
source_build |
⏩ skipped |
wheeldeps |
✅ success |
⏩ Skipped jobs (7) — intentionally skipped on PR builds; run on merge_group / workflow_dispatch
| Job |
|---|
binaries |
config_source_build |
docker_image |
gen_code_coverage |
python_metapackages |
python_wheels |
source_build |
All sub-jobs (42) — every matrix leg, with links
| Job | Status | Link |
|---|---|---|
| Build and test (amd64, gcc12, openmpi) / Dev environment (Debug) | ❌ failure | view |
| Build and test (amd64, gcc12, openmpi) / Dev environment (Python) | ❌ failure | view |
| Build and test (amd64, llvm, openmpi) / Dev environment (Debug) | ❌ failure | view |
| Build and test (amd64, llvm, openmpi) / Dev environment (Python) | ❌ failure | view |
| Build and test (arm64, llvm, openmpi) / Dev environment (Debug) | ❌ failure | view |
| Build and test (arm64, llvm, openmpi) / Dev environment (Python) | ❌ failure | view |
| CI Summary | ❔ in_progress | view |
| Configure build (devdeps) | ✅ success | view |
| Configure build (source_build) | ⏩ skipped | view |
| Configure build (wheeldeps) | ✅ success | view |
| Create CUDA Quantum installer | ⏩ skipped | view |
| Create Docker images | ⏩ skipped | view |
| Create Python metapackages | ⏩ skipped | view |
| Create Python wheels | ⏩ skipped | view |
| Gen code coverage | ⏩ skipped | view |
| Load dependencies (amd64, gcc12) / Caching | ✅ success | view |
| Load dependencies (amd64, gcc12) / Finalize | ✅ success | view |
| Load dependencies (amd64, gcc12) / Metadata | ✅ success | view |
| Load dependencies (amd64, llvm) / Caching | ✅ success | view |
| Load dependencies (amd64, llvm) / Finalize | ✅ success | view |
| Load dependencies (amd64, llvm) / Metadata | ✅ success | view |
| Load dependencies (arm64, gcc12) / Caching | ✅ success | view |
| Load dependencies (arm64, gcc12) / Finalize | ✅ success | view |
| Load dependencies (arm64, gcc12) / Metadata | ✅ success | view |
| Load dependencies (arm64, llvm) / Caching | ✅ success | view |
| Load dependencies (arm64, llvm) / Finalize | ✅ success | view |
| Load dependencies (arm64, llvm) / Metadata | ✅ success | view |
| Load source build cache | ⏩ skipped | view |
| Load wheel dependencies (amd64, 12.6) / Caching | ✅ success | view |
| Load wheel dependencies (amd64, 12.6) / Finalize | ✅ success | view |
| Load wheel dependencies (amd64, 12.6) / Metadata | ✅ success | view |
| Load wheel dependencies (amd64, 13.0) / Caching | ✅ success | view |
| Load wheel dependencies (amd64, 13.0) / Finalize | ✅ success | view |
| Load wheel dependencies (amd64, 13.0) / Metadata | ✅ success | view |
| Load wheel dependencies (arm64, 12.6) / Caching | ✅ success | view |
| Load wheel dependencies (arm64, 12.6) / Finalize | ✅ success | view |
| Load wheel dependencies (arm64, 12.6) / Metadata | ✅ success | view |
| Load wheel dependencies (arm64, 13.0) / Caching | ✅ success | view |
| Load wheel dependencies (arm64, 13.0) / Finalize | ✅ success | view |
| Load wheel dependencies (arm64, 13.0) / Metadata | ✅ success | view |
| Prepare cache clean-up | ❔ in_progress | view |
| Retrieve PR info | ✅ success | view |
⚠️ Required checks (0/6) — 6 missing — declared in .github/required-checks.yml for push
| Required check | Status | Link |
|---|---|---|
| Build and test (amd64, llvm, openmpi) / Dev environment (Debug) | ❌ failure | view |
| Build and test (amd64, llvm, openmpi) / Dev environment (Python) | ❌ failure | view |
| Build and test (arm64, llvm, openmpi) / Dev environment (Debug) | ❌ failure | view |
| Build and test (arm64, llvm, openmpi) / Dev environment (Python) | ❌ failure | view |
| Build and test (amd64, gcc12, openmpi) / Dev environment (Debug) | ❌ failure | view |
| Build and test (amd64, gcc12, openmpi) / Dev environment (Python) | ❌ failure | view |
|
Thank you @taalexander for your patience and for guiding me throughout this PR. I've had an excellent learning experience and really enjoyed working on this over the last couple of weeks! Your feedback was incredibly helpful and made the change much better, and it's been a pleasure working with you on it. |
Fixes #4289.
The qubit-mapping pass starts its SABRE router from identity placement (
virtual i -> physical i). On irregular topologies that ties the routed SWAP count to how the physical qubits happen to be numbered: the issue repro shows two isomorphic stars routing differently,star(5,2)at 2 swaps versusstar(5,0)at 1.My first attempt added a greedy interaction-aware placement, then used a static distance score to pick between that layout and identity. As @taalexander and @Zaneham pointed out, that left the placement, the selection, and the router each optimizing a different model, and the router's dynamic decisions can invalidate the static ones. This revision follows the structure they suggested and chooses the layout by the thing the router actually rewards: the routed SWAP count.
What changed
The router is now a read-only search with a single writer at the end:
RoutingProblem(the gate dependency DAG, source users, and virtual-qubit operands) is built once from the IR.identityand the greedy placement become seeds that only propose a starting layout. The staticlayoutScoreselection and the silent fallback to identity are gone.RoutingSearchStrategyroutes each seed in analyze mode and refines it with the paper's forward-backward-forward reverse traversal (Sec. IV-C2). The reverse pass is just a forward route over a transposedRoutingProblem, so it reuses the router unchanged.RoutingEmitterrewrites the IR once, from the winner.Two independent options replace the old single one:
placement = auto | identity | greedy(defaultauto: try both seeds, keep whichever routes better)search = sabre | none(defaultsabre: the reverse-traversal refinement, withnonea single forward route)placement=identity search=nonepreserves the old single-shot behavior and keeps the existing exact-output tests as the legacy regression net. A LightSABRE-style strategy (random restarts) would later be a newsearchvalue rather than another rewrite. For now the seeds are deterministic, so the exact-output tests stay reproducible, as agreed in the thread.Routing robustness
The SABRE cost function is a heuristic. With reverse-traversal refinement it can occasionally emit a long run of swaps without making any front-layer gate executable.
route()now has a release valve (inspired by Qiskit/LightSABRE): after a generous no-progress budget, it discards that local episode and forces the closest front-layer gate along a shortest path. On a connected device, any such direct route takes at most the device diameter, so this guarantees progress while leaving normally converging searches alone. The budget is intentionally loose by default (max(64, 4 * numPhysicalQubits)) and is covered by a smallpath(6)regression that contrasts a one-swap stall budget against a high budget: the valve route emits 5 swaps, the unrestricted heuristic emits 3, andCircuitCheckverifies both mapped outputs.Results
Issue repro (
@hub_pattern), defaults:star(5,2)now matchesstar(5,0), so the center-index artifact is gone and the mean over the sweep drops from 1.25 to 1.00.For a larger check I swept quantum-volume-style circuits (n qubits, n layers, random two-qubit pairings, 10 seeds per size) over
path,ring,grid, andstar, and compared against Qiskit's SABRE (Qiskit 2.4.2, sabre layout and routing) on the same circuits and coupling maps. Mean added swaps at 24 qubits:Through initial placement alone, the default closes 40 to 57 percent of the identity-to-Qiskit swap gap on
path,ring, andgrid, landing within roughly 10 to 19 percent of Qiskit. The remaining gap is the router-side work, the LightSABRE cost function and random restarts, that this issue lists as out-of-scope follow-ups. Onstaridentity is already near-optimal, so there is little to gain.On two-qubit depth the default is competitive with Qiskit and lower on
pathandgrid:The cost is up to two seeds times three routing walks instead of one routing walk, but the absolute pass time stays small. Mean
qubit-mappingpass time from--mlir-timing:The full sweep is in the plot, with script attached:
benchmark.py
Tests
The existing exact-output tests are pinned to
placement=identity search=noneand become the legacy-behavior regression net, with their CHECK bodies untouched. New tests cover the issue repro, the review's late-interaction scenario, a case where the reverse pass beats a single forward pass, the option matrix, invalid values for both options (fatal and warn), measurement/control-flow failure modes, and a small release-valve contrast test withFileCheckandCircuitCheck.AI disclosure: I used Opus via Claude Code to set up the remote build/test environment, help carry out the restructuring, and write the tests and benchmark harness. The design decisions, final edits, and verification are mine.