[unitaryhack] Topology-aware initial placement for the qubit-mapping pass by border-b · Pull Request #4678 · NVIDIA/cuda-quantum

border-b · 2026-06-05T16:47:19Z

The qubit-mapping pass starts its SABRE router from identity placement (virtual i -> physical i). On irregular topologies that ties the routed SWAP count to how the physical qubits happen to be numbered: the issue repro shows two isomorphic stars routing differently, star(5,2) at 2 swaps versus star(5,0) at 1.

My first attempt added a greedy interaction-aware placement, then used a static distance score to pick between that layout and identity. As @taalexander and @Zaneham pointed out, that left the placement, the selection, and the router each optimizing a different model, and the router's dynamic decisions can invalidate the static ones. This revision follows the structure they suggested and chooses the layout by the thing the router actually rewards: the routed SWAP count.

What changed

The router is now a read-only search with a single writer at the end:

A RoutingProblem (the gate dependency DAG, source users, and virtual-qubit operands) is built once from the IR.
identity and the greedy placement become seeds that only propose a starting layout. The static layoutScore selection and the silent fallback to identity are gone.
A RoutingSearchStrategy routes each seed in analyze mode and refines it with the paper's forward-backward-forward reverse traversal (Sec. IV-C2). The reverse pass is just a forward route over a transposed RoutingProblem, so it reuses the router unchanged.
The candidate with the fewest routed swaps wins, behind a small comparator so other metrics (depth, ...) can slot in later.
A RoutingEmitter rewrites the IR once, from the winner.

Two independent options replace the old single one:

placement = auto | identity | greedy (default auto: try both seeds, keep whichever routes better)
search = sabre | none (default sabre: the reverse-traversal refinement, with none a single forward route)

placement=identity search=none preserves the old single-shot behavior and keeps the existing exact-output tests as the legacy regression net. A LightSABRE-style strategy (random restarts) would later be a new search value rather than another rewrite. For now the seeds are deterministic, so the exact-output tests stay reproducible, as agreed in the thread.

Routing robustness

The SABRE cost function is a heuristic. With reverse-traversal refinement it can occasionally emit a long run of swaps without making any front-layer gate executable. route() now has a release valve (inspired by Qiskit/LightSABRE): after a generous no-progress budget, it discards that local episode and forces the closest front-layer gate along a shortest path. On a connected device, any such direct route takes at most the device diameter, so this guarantees progress while leaving normally converging searches alone. The budget is intentionally loose by default (max(64, 4 * numPhysicalQubits)) and is covered by a small path(6) regression that contrasts a one-swap stall budget against a high budget: the valve route emits 5 swaps, the unrestricted heuristic emits 3, and CircuitCheck verifies both mapped outputs.

Results

Issue repro (@hub_pattern), defaults:

device	legacy (identity)	default (search)
path(5)	1	1
star(5,0)	1	1
star(5,2)	2	1
grid(3,3)	1	1

star(5,2) now matches star(5,0), so the center-index artifact is gone and the mean over the sweep drops from 1.25 to 1.00.

For a larger check I swept quantum-volume-style circuits (n qubits, n layers, random two-qubit pairings, 10 seeds per size) over path, ring, grid, and star, and compared against Qiskit's SABRE (Qiskit 2.4.2, sabre layout and routing) on the same circuits and coupling maps. Mean added swaps at 24 qubits:

topology	identity	default	Qiskit SABRE
path	1410.3	1250.9	1091.6
ring	911.5	809.0	732.1
grid	396.2	357.3	300.0
star	139.8	138.5	127.5

Through initial placement alone, the default closes 40 to 57 percent of the identity-to-Qiskit swap gap on path, ring, and grid, landing within roughly 10 to 19 percent of Qiskit. The remaining gap is the router-side work, the LightSABRE cost function and random restarts, that this issue lists as out-of-scope follow-ups. On star identity is already near-optimal, so there is little to gain.

On two-qubit depth the default is competitive with Qiskit and lower on path and grid:

topology	identity	default	Qiskit SABRE
path	301.8	273.6	285.3
ring	183.8	166.7	164.2
grid	173.0	166.6	170.0
star	427.8	426.5	415.5

The cost is up to two seeds times three routing walks instead of one routing walk, but the absolute pass time stays small. Mean qubit-mapping pass time from --mlir-timing:

n	legacy	default	factor
4	0.28ms	0.31ms	1.1x
8	0.37ms	0.57ms	1.5x
12	0.86ms	2.01ms	2.3x
16	1.74ms	5.20ms	3.0x
20	3.38ms	11.28ms	3.3x
24	6.19ms	22.33ms	3.6x

The full sweep is in the plot, with script attached:

benchmark.py

#!/usr/bin/env python3
"""Routing benchmark for the qubit-mapping SABRE redesign (PR #4678).

Compares the legacy single-shot identity path against the new default search on
quantum-volume-style circuits, sweeping device topologies and sizes. Reports
routing quality (added swaps, two-qubit gate count, two-qubit depth) and the
qubit-mapping pass wall-clock, so the quality gain can be weighed against the
compile-time cost.

Reproduce:

    python benchmark.py --cudaq-opt /path/to/cudaq-opt --out results

Quantum-volume circuits are generated directly as value-semantics Quake: n
qubits, n layers, each layer pairing qubits by a seeded random permutation with
a two-qubit gate per pair. Only the two-qubit structure matters for routing, so
a controlled-x stands in for the Haar-random SU(4) block. Fixed seeds make every
number reproducible.
"""

import argparse
import csv
import os
import re
import statistics
import subprocess
import tempfile
import time
from collections import defaultdict


def gen_qv_qke(n, seed):
    """Return a Quake module for an n-qubit, n-layer QV-style circuit."""
    import random

    rng = random.Random(seed)
    lines = ["quake.wire_set @wires[2147483647]", "func.func @qv() {"]
    cur = {}
    for q in range(n):
        lines.append(f"  %v{q} = quake.borrow_wire @wires[{q}] : !quake.wire")
        cur[q] = f"%v{q}"
    k = 0
    for _ in range(n):
        perm = list(range(n))
        rng.shuffle(perm)
        for i in range(0, n - 1, 2):
            a, b = perm[i], perm[i + 1]
            g = f"%g{k}"
            lines.append(
                f"  {g}:2 = quake.x [{cur[a]}] {cur[b]} : "
                "(!quake.wire, !quake.wire) -> (!quake.wire, !quake.wire)")
            cur[a], cur[b] = f"{g}#0", f"{g}#1"
            k += 1
    for q in range(n):
        lines.append(f"  quake.return_wire {cur[q]} : !quake.wire")
    lines.append("  return")
    lines.append("}")
    return "\n".join(lines) + "\n"


# A 2q gate consumes two wires and produces two; everything else carries depth
# through unchanged.
RESULT_RE = re.compile(r"^\s*([%\w:,\s#]+?)\s*=\s*quake\.(\w+)\b(.*?):", re.S)
SSA_RE = re.compile(r"%[\w#]+")


def two_qubit_metrics(ir):
    """Count swaps and two-qubit gates and compute two-qubit depth from IR."""
    depth = defaultdict(int)
    swaps = twoq = 0
    maxdepth = 0
    for line in ir.splitlines():
        m = RESULT_RE.match(line)
        if not m:
            continue
        results_part, op, operands_part = m.group(1), m.group(2), m.group(3)
        operands = SSA_RE.findall(operands_part)
        results = []
        for tok in results_part.split(","):
            tok = tok.strip()
            base = SSA_RE.findall(tok)
            if not base:
                continue
            base = base[0]
            arity = 2 if tok.endswith(":2") else 1
            results += [f"{base}#{i}" for i in range(arity)] if arity == 2 \
                else [base]
        is_2q = op == "swap" or (op == "x" and "[" in operands_part)
        if is_2q:
            twoq += 1
            if op == "swap":
                swaps += 1
            d = 1 + max((depth[o] for o in operands), default=0)
        else:
            d = max((depth[o] for o in operands), default=0)
        for r in results:
            depth[r] = d
        maxdepth = max(maxdepth, d)
    return swaps, twoq, maxdepth


PASS_TIME_RE = re.compile(r"([\d.]+)\s+\(.*?\)\s+(.*Mapping.*)")


def pass_ms(timing_text):
    """Sum the mapping pass wall-clock (ms) from -mlir-timing output. The first
    float on each pass line is its wall time in seconds."""
    total = 0.0
    for line in timing_text.splitlines():
        if "apping" in line:  # MappingPrep, MappingFunc, ...
            nums = re.findall(r"(\d+\.\d+)", line)
            if nums:
                total += float(nums[0]) * 1000.0
    return total


def run_once(cudaq_opt, qke, device, placement, search):
    args = (f"--qubit-mapping=device={device} placement={placement} "
            f"search={search}")
    with tempfile.NamedTemporaryFile("w", suffix=".qke", delete=False) as fh:
        fh.write(qke)
        path = fh.name
    try:
        cmd = [cudaq_opt, args, path, "-mlir-timing"]
        t0 = time.perf_counter()
        try:
            p = subprocess.run(cmd, capture_output=True, text=True, timeout=120)
        except subprocess.TimeoutExpired:
            return None
        wall = (time.perf_counter() - t0) * 1000.0
    finally:
        os.unlink(path)
    if p.returncode != 0:
        return None
    swaps, twoq, depth = two_qubit_metrics(p.stdout)
    ms = pass_ms(p.stderr)
    return dict(swaps=swaps, twoq=twoq, depth=depth,
                pass_ms=round(ms, 4), wall=round(wall, 4),
                ms=round(ms or wall, 4))


CONFIGS = {
    "legacy": dict(placement="identity", search="none"),
    "search": dict(placement="auto", search="sabre"),
}


def devices_for(n):
    w = int(round(n ** 0.5))
    h = (n + w - 1) // w
    return {
        "path": f"path({n})",
        "ring": f"ring({n})",
        "grid": f"grid({w},{h})",
        "star": f"star({n},0)",
    }


def main():
    ap = argparse.ArgumentParser()
    ap.add_argument("--cudaq-opt", required=True)
    ap.add_argument("--out", default="results")
    ap.add_argument("--sizes", default="4,6,8,10,12,14,16")
    ap.add_argument("--seeds", type=int, default=10)
    args = ap.parse_args()

    sizes = [int(s) for s in args.sizes.split(",")]
    os.makedirs(args.out, exist_ok=True)
    rows = []
    for n in sizes:
        for topo, device in devices_for(n).items():
            for seed in range(args.seeds):
                qke = gen_qv_qke(n, seed)
                for cfg, opts in CONFIGS.items():
                    r = run_once(args.cudaq_opt, qke, device, **opts)
                    if r is None:
                        continue
                    rows.append(dict(n=n, topo=topo, seed=seed, config=cfg, **r))
            done = {c: [x for x in rows
                        if x["n"] == n and x["topo"] == topo and x["config"] == c]
                    for c in CONFIGS}
            msg = []
            for c in CONFIGS:
                if done[c]:
                    msg.append(f"{c} swaps~{statistics.mean(x['swaps'] for x in done[c]):.1f}")
            print(f"n={n:>3} {topo:<5} " + "  ".join(msg), flush=True)

    csv_path = os.path.join(args.out, "bench.csv")
    with open(csv_path, "w", newline="") as fh:
        w = csv.DictWriter(fh, fieldnames=["n", "topo", "seed", "config",
                                           "swaps", "twoq", "depth",
                                           "pass_ms", "wall", "ms"])
        w.writeheader()
        w.writerows(rows)
    print(f"wrote {csv_path} ({len(rows)} rows)")

    try:
        make_plot(rows, sizes, os.path.join(args.out, "bench.png"))
    except Exception as e:  # plotting is optional; the CSV is the source of truth
        print(f"[plot skipped: {e}]")


def make_plot(rows, sizes, path):
    import matplotlib
    matplotlib.use("Agg")
    import matplotlib.pyplot as plt

    topos = ["path", "ring", "grid", "star"]
    fig, axes = plt.subplots(3, len(topos), figsize=(4 * len(topos), 9),
                             sharex=True)

    def mean(n, topo, cfg, key):
        vals = [r[key] for r in rows
                if r["n"] == n and r["topo"] == topo and r["config"] == cfg]
        return statistics.mean(vals) if vals else float("nan")

    # Dashed/square for legacy, solid/circle for search, so the two stay
    # readable where they nearly coincide (e.g. star).
    styles = {
        "legacy": dict(color="C0", linestyle="--", marker="s", label="legacy"),
        "search": dict(color="C1", linestyle="-", marker="o", label="search"),
    }

    for j, topo in enumerate(topos):
        for cfg, st in styles.items():
            axes[0][j].plot(sizes, [mean(n, topo, cfg, "swaps") for n in sizes],
                            **st)
            axes[2][j].plot(sizes,
                            [mean(n, topo, cfg, "pass_ms") for n in sizes], **st)
        reduction = []
        for n in sizes:
            leg = mean(n, topo, "legacy", "swaps")
            se = mean(n, topo, "search", "swaps")
            reduction.append((leg - se) / leg * 100.0 if leg else 0.0)
        axes[1][j].plot(sizes, reduction, color="C2", marker="o")
        axes[1][j].axhline(0, color="0.7", lw=0.8)
        axes[0][j].set_title(topo)
        axes[2][j].set_xlabel("qubits")
    axes[0][0].set_ylabel("added swaps (mean)")
    axes[1][0].set_ylabel("swap reduction (%)")
    axes[2][0].set_ylabel("mapping pass ms (mean)")
    axes[0][-1].legend()
    fig.suptitle("QV-style routing: legacy identity vs default search")
    fig.tight_layout()
    fig.savefig(path, dpi=120)
    print(f"wrote {path}")


if __name__ == "__main__":
    main()

Tests

The existing exact-output tests are pinned to placement=identity search=none and become the legacy-behavior regression net, with their CHECK bodies untouched. New tests cover the issue repro, the review's late-interaction scenario, a case where the reverse pass beats a single forward pass, the option matrix, invalid values for both options (fatal and warn), measurement/control-flow failure modes, and a small release-valve contrast test with FileCheck and CircuitCheck.

AI disclosure: I used Opus via Claude Code to set up the remote build/test environment, help carry out the restructuring, and write the tests and benchmark harness. The design decisions, final edits, and verification are mine.

copy-pr-bot · 2026-06-05T16:47:23Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Zaneham · 2026-06-08T10:56:31Z

Nice repro. Two isomorphic stars routing to different SWAP counts is a clean way to surface the physical-numbering artifact, and scoring the greedy layout against identity and keeping whichever wins is a sensible guard against regressions.

The thing worth raising is that the placement and the selection are optimising slightly different models. interactionPlacement builds the layout from an untimed interaction matrix where every two-qubit gate counts the same, while layoutScore weights interactions by gate order through the numI - i term. Because the router runs off the front layer with a lookahead window, a placement that only knows total interaction weight can seat a pair well by overall count and still badly for the gates the router reaches first. Making the placement itself front-layer aware, rather than only the final selection, is the more principled version, and it matches what the router actually rewards.

Two smaller things. The selection runs on a static distance proxy, which you flag honestly, and the stronger if costlier option is to route both candidate layouts and compare real SWAP counts. And the linear recency weighting reads as a little arbitrary without a sentence tying it to the router's own decay behaviour.

Solid work overall, and the regression-guard instinct is the right one.

taalexander

Hi @border-b, thank you for your contribution. It is a great start but will need some degree of rethinking to be ready for merge. As noted by @Zaneham in his comment the current placement, layout and router are optimizing different models. I also think we are missing larger demos/benchmarks on some quantum volume like circuits to understand that the PR is having the intended improvement.

The interaction placement builds the candidate from the total two-qubit interaction counts. This treats all interactions equally indepedent of when they happen in the circuit. Later on you use the layoutScore to choose between greedy and identity with a distance score. I don't think you should be silently falling back to identity if a different placement strategy was selected.

However, the bigger issue is that chooseSwap then does it's selection based on the dynamic front layer. This will invalidate most of the static assumptions and might lead to a situation like:

Circuit has many late interactions between q0 and `q1
Greedy placement puts q0 and q1 close because of this.
The first front-layer gates use q2-q3 and q3-q4.
The router adds immediate swaps
Swaps have not changed the placement so that original static score is invalidated

The issue is that these models are not working together. The first layout could optimize for pairs that matter often but not until late in the circuit. The second model could choose a layout based on early distances without understand the routing. Then the third model could invalidate all of these static assumptions by inserting early swaps.

I think a good direction would be to not make greedy the whole placement system and instead structure this as a small (internal) layout/routing system to which we could add the greedy candidate now and maybe SABRE/LightSABRE style candidates in the future (although one of these as an alternative to greedy is acceptable and preferred).

An ideal approach might look something like:

Build a RoutingProblem from the IR once. Capture the device, routeable operations, source wires, measurement constraints, and virtual-qubit mapping.
Generate starting LayoutCandidates. Keep this as a small helper layer. Identity, greedy, dense, and random layouts should only propose starting layouts. Now greedy is just a potential placement seed to kickoff the search.
Add a RoutingSearchStrategy which should own the search. It should route candidates in analyze mode and then run the SABRE forward/backward refinement. It's output should be the final routing selected for the router to apply.
Select the best RoutingResult. Choose by the routed SWAP count. Longer term we might want to make this selectable (eg., swap-count, depth, etc.) or at the choice of the strategy.
Emit the selected result once through a RoutingEmitter. Rewrite the IR only after the best routed result has been selected.

I would prefer if the the reverse traversal strategy was used from the original paper as opposed to the greedy solution. In this way something like a LightSABRE extension would just add a new strategy (evaluating many candidate layouts) in the future and not a new pass rewrite.

Please feel free to ask me any clarifying questions you might have 😄

border-b · 2026-06-09T22:15:52Z

Thanks @taalexander and @Zaneham for reviewing and the thoughtful comments. The point about the placement, selection, and router optimizing different models is well taken. I'm going back through the SABRE paper and the proposed restructuring now, with reverse traversal as the likely direction (I'd considered it early on but went with the smaller change at the time).

I'll follow up with some questions in the next couple of days.

taalexander · 2026-06-10T11:39:02Z

Thank you @border-b, looking forward to seeing the new and improved version!

border-b · 2026-06-11T23:10:44Z

@taalexander The plan below is essentially the points from your review with the details filled in. I'm writing it out mostly to verify we're on the same page before I start, plus a few questions where I'd rather ask than guess.

SabreRouter::route() currently rewires each gate the moment it's mappable and inserts every chosen swap immediately, so routing a candidate layout means changing the circuit. We'll split it: the router produces a RoutingResult (swap insertions, final mapping, swap count) from a read-only walk, and a RoutingEmitter applies the selected result to the IR once at the end. Everything else builds on this split.
A RoutingProblem gets built once from the IR: device and distance matrix, the gate dependency DAG, qubit sources, measurement constraints, and the wire-to-virtual mapping.
Identity and the greedy interactionPlacement from this PR become seed generators that only propose starting layouts. The static layoutScore selection goes away, and so does the silent fallback to identity.
For each seed: route forward in analyze mode, route the two-qubit gate DAG in reverse starting from the resulting final mapping, and take where the qubits land as the refined starting layout. One more forward pass produces that candidate's RoutingResult. This is the paper's forward-backward-forward setup; I'd keep the traversal count an internal constant rather than a new pass option.
The result with the fewest routed swaps wins and gets emitted once. The comparison stays behind a small interface so depth or other metrics can slot in later, as you suggested.

From the current PR, interactionPlacement survives as a seed generator and I think the tests get reworked around the new structure.

Questions

The paper starts reverse traversal from a random layout, and trying many random seeds is where a LightSABRE-style strategy would get most of its value. But randomness makes compilation nondeterministic and the exact-output tests can't pin results. For this PR should I use deterministic seeds only (identity + greedy) and leave random restarts to the future strategy? Or do you want randomness with a fixed default seed?
I plan to make the search the default, since the issue shows identity placement ties output quality to physical qubit numbering. The single-shot identity path stays reachable as an explicit option. Currently placement= option covers everything, but seeds and search strategy are now independent choices, so I'd lean toward two options (seeds, strategy) so a future LightSABRE would just be a new strategy value. Let me know if the default flip and the config option split sounds good.
You listed measurement constraints as part of RoutingProblem, which I'm interpreting as: preserve measurement order, and defer measurement mapping to the end as the router does currently. Both forward passes route the real circuit under those constraints, only the reverse pass skips measurements. Is there anything else that RoutingProblem should capture?
For the benchmark request, what would you consider sufficient for this PR: a reproducible table in the PR conversation comparing identity vs the new search on QV-style circuits across a few topologies and sizes? Or should I add a checked-in benchmark script?

taalexander · 2026-06-12T18:25:42Z

Hi @border-b. Yes, your understanding of my proposal is correct. For your questions.

The paper starts reverse traversal from a random layout, and trying many random seeds is where a LightSABRE-style strategy would get most of its value. But randomness makes compilation nondeterministic and the exact-output tests can't pin results. For this PR should I use deterministic seeds only (identity + greedy) and leave random restarts to the future strategy? Or do you want randomness with a fixed default seed?

Yes deterministic seeds is great for now. A follow-up PR with lightSABRE like layout exploration would be great but doesn't need to go into this.

I plan to make the search the default, since the issue shows identity placement ties output quality to physical qubit numbering. The single-shot identity path stays reachable as an explicit option. Currently placement= option covers everything, but seeds and search strategy are now independent choices, so I'd lean toward two options (seeds, strategy) so a future LightSABRE would just be a new strategy value. Let me know if the default flip and the config option split sounds good.

Yes, the initial placement maybe would have an option to be fixed., eg., the identity or something else but the search by default.

You listed measurement constraints as part of RoutingProblem, which I'm interpreting as: preserve measurement order, and defer measurement mapping to the end as the router does currently. Both forward passes route the real circuit under those constraints, only the reverse pass skips measurements. Is there anything else that RoutingProblem should capture?

These may or may not exist. If the current pass doesn't use these then do not include them.

For the benchmark request, what would you consider sufficient for this PR: a reproducible table in the PR conversation comparing identity vs the new search on QV-style circuits across a few topologies and sizes? Or should I add a checked-in benchmark script?

Yes, this seems good. A plot with a script attached sweeping circuit optimization metrics (circuit depth/gates) and wall-clock compilation (ideally for the passes before/after). To ensure circuit mapping is improved with limited wall-clock cost.

Signed-off-by: Seemanta Bhattacharjee <babune99@gmail.com>

border-b · 2026-06-15T22:49:30Z

@taalexander reworked along the structure you proposed, now selecting the layout by routed swap count.

The QV benchmark you asked for is in the description, swap reduction next to the qubit-mapping pass wall-clock, with the plot attached. Happy to iterate.

taalexander · 2026-06-16T00:45:49Z

Thanks for your changes @border-b. I am reviewing now! Just FYI the hackathon closes June 17th so let's work hard to finish this up and get it to a state where it can be merged 🚀

border-b · 2026-06-16T18:23:35Z

Hi @taalexander, I've updated the description with some new benchmarks. I added a Qiskit SABRE comparison and two-qubit depth alongside the swap counts.

taalexander · 2026-06-16T18:24:16Z

I ran this on a large VQE benchmark on a square lattice and received some very nice results thanks to the improved layout/routing implementing the proper SABRE algorithm:

Metric	Current branch (`b558e7b369`)	PR #4678 (`057fc186db`)	Delta
Wall-clock compile time	476.27 s	310.32 s	-165.95 s (-34.8%)
Routing/mapping stage	92.65 s	26.15 s	-66.50 s (-71.8%)
Routing share of wall-clock	19.5%	8.4%	-11.0 pp
`MappingFunc` pass time	91064.83 ms	24945.18 ms	-66119.65 ms (-72.6%)
`MappingFunc` share of pass-classified time	43.2%	22.5%	-20.7 pp
Input total gate count	347680	347680	0
Input 2Q gate count	61344	61344	0
Input depth	640	640	0
Output total gate count	804764	471008	-333756 (-41.5%)
Output 2Q gate count	172596	61344	-111252 (-64.5%)
Output depth	126265	962	-125303 (-99.2%)

taalexander

Phase 1 review. Will continue from here. This is looking great, mostly refactoring questions but a few potential bugs.

taalexander

Phase 2 review. A few more minor comments. Can you please make sure you that all new file copyrights are for 2026 only.

Would appreciate a quick turnaround on this so we can get merged 🚀

border-b · 2026-06-17T14:50:18Z

Hi @taalexander, thank you for the thorough and helpful reviews! I've been working on them and you can expect the changes pushed in the next hour or so. Sharing a couple of questions under the specific comments.

Signed-off-by: Seemanta Bhattacharjee <babune99@gmail.com>

border-b · 2026-06-17T19:34:55Z

Hi @taalexander, I pushed the follow-up refactor addressing the current review threads. Sorry this took longer than I initially expected. 😅

I verified the updated mapping tests pass, and the benchmarks match. I’ll be on standby for the next few hours to iterate if needed. Otherwise, I’ll follow up early tomorrow. Thanks again for the detailed reviews, they’ve been really helpful!

taalexander · 2026-06-17T19:45:00Z

Thanks @border-b. Would you mind commenting how you addressed them and resolving the comments as necessary if you believe that you have :)

taalexander

Thank you very much for your changes. We're almost there! A few more comments. If you address these I will get it merged in the next day or two 🚀 and you will be eligible for the bounty still (I've sourced a bit more time). Can you please give me push access to your branch just in case?

border-b · 2026-06-17T21:21:22Z

Thanks @taalexander. Maintainer edits are already enabled on this branch. I’ll revisit this tomorrow morning and address the remaining comments!

Signed-off-by: Seemanta Bhattacharjee <babune99@gmail.com>

taalexander · 2026-06-19T00:25:35Z

/ok to test dec2106

Command Bot: Processing...

taalexander · 2026-06-19T00:33:15Z

Thank you for addressing my comments @border-b, the PR is in very good shape now with one minor issue and some CI work to get it passing that I can guide along. I have marked this as unitaryhack-accepted and will approve/merge it as soon as I have CI pipelines passing (which requires my approvals to run) and I set up some benchmarks to run internally.

It has been a pleasure working with you on this. This came together as a very nice improvement for the project

Signed-off-by: Thomas Alexander <talexander@nvidia.com>

taalexander · 2026-06-19T00:51:21Z

/ok to test b167553

Command Bot: Processing...

taalexander

LGTM, thank you very much for your unitaryHACK contribution @border-b. You went above and beyond.

Signed-off-by: Thomas Alexander <talexander@nvidia.com>

taalexander · 2026-06-19T01:50:33Z

/ok to test 822d0d1

Command Bot: Processing...

github-actions · 2026-06-19T05:16:30Z

CI Summary (`push`) — ❌ failed

Run #27800485506 · ✅ 5 · ⏩ 7 · ❌ 1 · ⛔ 0

❌ Failed or cancelled

Job	Result	Link
`build_and_test`	❌ failure	view

Top-level jobs (13)

Job	Result
`binaries`	⏩ skipped
`build_and_test`	❌ failure
`config_devdeps`	✅ success
`config_source_build`	⏩ skipped
`config_wheeldeps`	✅ success
`devdeps`	✅ success
`docker_image`	⏩ skipped
`gen_code_coverage`	⏩ skipped
`metadata`	✅ success
`python_metapackages`	⏩ skipped
`python_wheels`	⏩ skipped
`source_build`	⏩ skipped
`wheeldeps`	✅ success

⏩ Skipped jobs (7) — intentionally skipped on PR builds; run on merge_group / workflow_dispatch

Job
`binaries`
`config_source_build`
`docker_image`
`gen_code_coverage`
`python_metapackages`
`python_wheels`
`source_build`

All sub-jobs (42) — every matrix leg, with links

Job	Status	Link
Build and test (amd64, gcc12, openmpi) / Dev environment (Debug)	❌ failure	view
Build and test (amd64, gcc12, openmpi) / Dev environment (Python)	❌ failure	view
Build and test (amd64, llvm, openmpi) / Dev environment (Debug)	❌ failure	view
Build and test (amd64, llvm, openmpi) / Dev environment (Python)	❌ failure	view
Build and test (arm64, llvm, openmpi) / Dev environment (Debug)	❌ failure	view
Build and test (arm64, llvm, openmpi) / Dev environment (Python)	❌ failure	view
CI Summary	❔ in_progress	view
Configure build (devdeps)	✅ success	view
Configure build (source_build)	⏩ skipped	view
Configure build (wheeldeps)	✅ success	view
Create CUDA Quantum installer	⏩ skipped	view
Create Docker images	⏩ skipped	view
Create Python metapackages	⏩ skipped	view
Create Python wheels	⏩ skipped	view
Gen code coverage	⏩ skipped	view
Load dependencies (amd64, gcc12) / Caching	✅ success	view
Load dependencies (amd64, gcc12) / Finalize	✅ success	view
Load dependencies (amd64, gcc12) / Metadata	✅ success	view
Load dependencies (amd64, llvm) / Caching	✅ success	view
Load dependencies (amd64, llvm) / Finalize	✅ success	view
Load dependencies (amd64, llvm) / Metadata	✅ success	view
Load dependencies (arm64, gcc12) / Caching	✅ success	view
Load dependencies (arm64, gcc12) / Finalize	✅ success	view
Load dependencies (arm64, gcc12) / Metadata	✅ success	view
Load dependencies (arm64, llvm) / Caching	✅ success	view
Load dependencies (arm64, llvm) / Finalize	✅ success	view
Load dependencies (arm64, llvm) / Metadata	✅ success	view
Load source build cache	⏩ skipped	view
Load wheel dependencies (amd64, 12.6) / Caching	✅ success	view
Load wheel dependencies (amd64, 12.6) / Finalize	✅ success	view
Load wheel dependencies (amd64, 12.6) / Metadata	✅ success	view
Load wheel dependencies (amd64, 13.0) / Caching	✅ success	view
Load wheel dependencies (amd64, 13.0) / Finalize	✅ success	view
Load wheel dependencies (amd64, 13.0) / Metadata	✅ success	view
Load wheel dependencies (arm64, 12.6) / Caching	✅ success	view
Load wheel dependencies (arm64, 12.6) / Finalize	✅ success	view
Load wheel dependencies (arm64, 12.6) / Metadata	✅ success	view
Load wheel dependencies (arm64, 13.0) / Caching	✅ success	view
Load wheel dependencies (arm64, 13.0) / Finalize	✅ success	view
Load wheel dependencies (arm64, 13.0) / Metadata	✅ success	view
Prepare cache clean-up	❔ in_progress	view
Retrieve PR info	✅ success	view

⚠️ Required checks (0/6) — 6 missing — declared in .github/required-checks.yml for push

Required check	Status	Link
Build and test (amd64, llvm, openmpi) / Dev environment (Debug)	❌ failure	view
Build and test (amd64, llvm, openmpi) / Dev environment (Python)	❌ failure	view
Build and test (arm64, llvm, openmpi) / Dev environment (Debug)	❌ failure	view
Build and test (arm64, llvm, openmpi) / Dev environment (Python)	❌ failure	view
Build and test (amd64, gcc12, openmpi) / Dev environment (Debug)	❌ failure	view
Build and test (amd64, gcc12, openmpi) / Dev environment (Python)	❌ failure	view

border-b · 2026-06-19T05:51:51Z

Thank you @taalexander for your patience and for guiding me throughout this PR. I've had an excellent learning experience and really enjoyed working on this over the last couple of weeks! Your feedback was incredibly helpful and made the change much better, and it's been a pleasure working with you on it.

taalexander requested changes Jun 8, 2026

View reviewed changes

taalexander changed the title ~~Topology-aware initial placement for the qubit-mapping pass~~ [unitaryhack] Topology-aware initial placement for the qubit-mapping pass Jun 10, 2026

border-b force-pushed the issue/4289 branch from 80b7601 to 313d0b5 Compare June 15, 2026 22:27

add reverse-traversal initial-mapping search to qubit mapping plus tests

057fc18

Signed-off-by: Seemanta Bhattacharjee <babune99@gmail.com>

border-b force-pushed the issue/4289 branch from 313d0b5 to 057fc18 Compare June 15, 2026 22:41

taalexander requested changes Jun 16, 2026

View reviewed changes

taalexander requested changes Jun 17, 2026

View reviewed changes

taalexander mentioned this pull request Jun 17, 2026

Support trivial control flow in Mapping #4733

Open

border-b added 2 commits June 18, 2026 01:11

refactor placement/routing, guard mid-circuit measurements

1e68cbd

Signed-off-by: Seemanta Bhattacharjee <babune99@gmail.com>

copyrights, measurement-guard tests, valve contrast test

4642738

Signed-off-by: Seemanta Bhattacharjee <babune99@gmail.com>

border-b force-pushed the issue/4289 branch from b762d92 to 4642738 Compare June 17, 2026 19:13

taalexander requested changes Jun 17, 2026

View reviewed changes

address mapping review feedback

dec2106

Signed-off-by: Seemanta Bhattacharjee <babune99@gmail.com>

taalexander added the unitaryhack-accepted label Jun 19, 2026

taalexander added 2 commits June 18, 2026 21:34

Format mapping pass changes

ae79a1d

Signed-off-by: Thomas Alexander <talexander@nvidia.com>

Reject adaptive measurement mapping before CFG bypass

b167553

Signed-off-by: Thomas Alexander <talexander@nvidia.com>

taalexander approved these changes Jun 19, 2026

View reviewed changes

Fix greedy placer initializer order

822d0d1

Signed-off-by: Thomas Alexander <talexander@nvidia.com>

Conversation

border-b commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed

Routing robustness

Results

Tests

Uh oh!

copy-pr-bot Bot commented Jun 5, 2026

Uh oh!

Zaneham commented Jun 8, 2026

Uh oh!

taalexander left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

border-b commented Jun 9, 2026

Uh oh!

taalexander commented Jun 10, 2026

Uh oh!

border-b commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taalexander commented Jun 12, 2026

Uh oh!

border-b commented Jun 15, 2026

Uh oh!

taalexander commented Jun 16, 2026

Uh oh!

border-b commented Jun 16, 2026

Uh oh!

taalexander commented Jun 16, 2026

Uh oh!

taalexander left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

taalexander left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

border-b commented Jun 17, 2026

Uh oh!

border-b commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taalexander commented Jun 17, 2026

Uh oh!

taalexander left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

border-b commented Jun 5, 2026 •

edited

Loading

taalexander left a comment •

edited

Loading

border-b commented Jun 11, 2026 •

edited

Loading

taalexander left a comment •

edited

Loading

border-b commented Jun 17, 2026 •

edited

Loading

taalexander commented Jun 19, 2026 •

edited by github-actions Bot

Loading

taalexander commented Jun 19, 2026 •

edited by github-actions Bot

Loading

taalexander commented Jun 19, 2026 •

edited by github-actions Bot

Loading

CI Summary (`push`) — ❌ failed