Skip to content

Rewrite Sabre interfaces to be Rust native#14317

Merged
mtreinish merged 10 commits into
Qiskit:mainfrom
jakelishman:sabre/no-py
Jul 8, 2025
Merged

Rewrite Sabre interfaces to be Rust native#14317
mtreinish merged 10 commits into
Qiskit:mainfrom
jakelishman:sabre/no-py

Conversation

@jakelishman

@jakelishman jakelishman commented May 7, 2025

Copy link
Copy Markdown
Member

Summary

From a pure-Rust interface perspective, there are several things going on here, ticks indicate whether they're done or not:

  • Move all descriptions of the routing target information and construction of them into Rust (the new RoutingTarget)
  • Move all construction of the Sabre interaction DAG to Rust (the new SabreDAG)
  • Move SabreSwap DAG rebuild to Rust (the new RoutingResult and RoutingResult::rebuild)
  • Rework Python and Rust entry points sabre_routing/swap_map to use the new structures and returns.
  • Rewrite Python SabreSwap to use the new Rust components.
  • Move SabreLayout DAG rebuild to Rust (RoutingResult::rebuild_onto is most of the rebuild, but it's not hooked up).
  • Rewrite Python-space SabreLayout to use the new Rust components.
  • Properly honour SabreLayout's skip_routing option (rather than routing and then throwing it away again)
  • Handle disjoint coupling maps in SabreLayout using Port disjoint layout utils in rust #14177.
  • Fix StarPrerouting (obsoleted by Revert "Port star_preroute to rust" #14381).
  • Rebase into something coherent.
  • Sort out last todo!() stubs that apparently have no test coverage.

Details and comments

Close #12279
Close #12280

The main bulk of this PR is preceded by several commits to parts of crates/circuit that I've separated out. Any/all of these could become separate PRs if desired; all the commits should be standalone (though I didn't test individual ones). See the commit messages for details.

The rest of this section is about commit 0549a6d, which is all actual Sabre stuff (it's just the commit message).

This is a major rewrite of all the Sabre interfaces, though the core routing logic is unchanged. Note that the test suite is unaltered.

At least as far as the test suite covers, this change is 100% RNG compatible with its parent, and in several places, explicit code is added to achieve this. In utility-scale testing (in particular, one ASV test), several tens of thousands of iterations into one particular routing problem, there is a divergence, which I suspect to be floating-point differences given that everything else is the same.

Changes to data structures

Changed SabreDAG

SabreDAG is now completely Rust-internal and not even exposed to Python. It's constructed directly from a DAGCircuit. When control-flow blocks are encountered, the temporary DAGCircuit backing them is stored within the SabreDAG node itself, so there's no need for all the block_results side-car handling.

The nodes in the graph explicitly store how the node affects routing: they can be TwoQ (the actual routing target), Synchronize (a non-routing object that has only data dependencies) or ControlFlow. This simplifies the internal handling within the router, since now it doesn't need to re-infer it. It also lets us decay control-flow operations to simple Synchronize points when we're in layout mode, similar to how clearing the block_nodes field could implicitly do before.

A follow-up commit (which necessarily changes the RNG compatibility) makes stronger use of this new SabreDAG form to compress the interaction graph during construction, reducing the amount of work done by routing trials when advancing the state after a gate is routed.

New RoutingProblem

A minor record-like struct that bundles up several related arguments.

New RoutingTarget

The old Python-space code had a 3-tuple

(NeighborTable, Graph<(), ()>, ArrayView2<f64>)

to represent the hardware chip the routing was taking place on. The first two elements stem from difficulties transferring the Rustworkx Python-wrapped graph backing CouplingMap down to Rust space, and from trying to faithfully port the existing implementation to Rust.

The NeighborTable and the Graph<(), ()> functionally serve the same role; they both relate a node to its neighbors, with one presenting a list-like interface for efficient iteration, and one presenting a graph-like interface so graph algorithms like dijsktra, token_swapper and adjacency_matrix could be called on it.

Now in pure Rust space, both are replaced by a single Neighbors structure, which supplies both interfaces. It implements all the relevant petgraph::visit traits, so it can be directly passed to petgraph and rustworkx_core graph algorithms. Its data format is a flat CSR-like sorted list of neighbors. It's logically the same as NeighborTable, except all the SmallVecs are flattened into a single contiguous Vec with partition points.

The distance matrix remains the same; we need super fast lookups of scores because it's in the innermost loop of the swap-mapper.

New RoutingResult

The new result from swap_map_trial is no longer several distinct but interlinking objects, but an encapsulated RoutingResult that borrows the same data as RoutingProblem borrows (the SabreDAG, the DAGCircuit, etc), and includes the chosen swaps inline. It contains all the information necessary to rebuild a fully routed circuit, but does not do this until requested.

Implementation changes

Remaining Python-space passes

SabreLayout and SabreSwap in Python space mostly just organise the Python-space inputs into the correct data types for Rust space, pass everything off, then reconstruct the Python-space objects that those passes are expected to store in the PropertySet. They contain next to no business logic themselves. The exception is that SabreLayout still has its mode for running with a custom routing pass, which is still all in Python space.

Target-native

Both layout and routing are now Target-native. The Rust methods only accept a Target. If Python space is supplied with a CouplingMap, we lift it into a dummy Target that implies the same connectivity constraints. The public attributes of the passes that allowed access to coupling-map properties are retained, lazily computed on demand, for those that might still be accessing them.

Disjoint handling

The disjoint handling is now done entirely in Rust; sabre_layout_and_routing handles it internally. This includes full routing of the DAG, even if it is split across multiple components; the previous SabreLayout implementation bailed out at this stage and left it to a later SabreSwap pass, but it's only a couple more lines of code to handle in Rust space now.

A follow-up commit may hugely simplify the disjoint handling such that the DAG never even needs to be split up.

Old details and comments

Old PR message, now out of date As of commit 815bc3d (the original HEAD of this PR), `SabreSwap` works (I believe) completely correctly end-to-end. `SabreLayout` is currently totally broken in Python space because I haven't finished rewriting the pass yet. `StarPrerouting` is straight-up deleted because it reaches deep into Sabre internals, and I didn't want to have to deal with the pass just to have a working build - it'll be restored at some point.

New SabreDAG

The new SabreDAG is far more detached from the Python-space one, and needed (or perhaps just wanted) its control-flow handling rewritten so the ownership model both of SabreDAG and the output RoutingResult was far clearer when it came to control-flow blocks. Previously, the ownership around the temporary DAGCircuits of the inner blocks was quite awkward in the Rust/Python split, with the blocks being flattened in the result, but Python space holding onto the temporary DAGCircuits backing the blocks, in order to do the rebuild later. The new SabreDAG stores the temporary DAGCircuits within itself, next to the SabreDAG that refers to that block. This part of the change made the move to pure-Rust result and rebuild operations much more natural and easier. The types of node in the graph are now called Synchronize, TwoQ and ControlFlow, where Synchronize is an operation that has no routing requirements, it only imposes data-synchronisation constraints (like a barrier, say).

An additional change to SabreDAG that probably shouldn't be in this PR (and I might separate out later) is the coalescence of runs of operations onto the same arguments, or subsets, into the same node in the Sabre interaction graph, wherever possible. For example, any 1q gate is compressed into its preceding 2q gate. Similarly, any time we see a 2q gate followed by zero or more 1q gates on those two qubits, then another 2q gate on the same two qubits, the second 2q gate imposes no possible further routing constraints, so is always compressed into the prior node. I mostly made this change as a performance optimisation for update_route and populate_extended_set (there are fewer nodes to explicitly visit), and because I wanted to avoid the HashMap-based SwapMap strategy of the Python-space result object, and instead have the swaps inline. The cost, though, is that I didn't want the result object to have loads of empty Vecs in it - coalescing the nodes in the interaction graph has the knock-on effect of making it so most of the individual nodes in the result will have initial swaps.

New RoutingResult

This replaces the tuple of a few pyclasses, in favour of single unified Rust handling. The RoutingResult object stores a reference to the SabreDAG (and the DAGCircuit) it relates to, so it can rebuild itself, and so that it needn't store copies of the full routing order; just as the prior Python result object stored node indices in the Python DAG to facilitate the rebuild, this one stores node indices in the SabreDAG, so we get the full order of coalesced nodes without cloning it into the result object.

The new RoutingResult object can completely reconstruct the physical output when acting in sabre_routing mode, or (in theory - untested as yet) sabre_layout mode when there is no disjoint handling required. RoutingResult::rebuild is effectively a full implementation of circuit embedding, layout application and routing rebuild. Going further, RoutingResult::rebuild_onto should (again, untested) work to manage rebuilding onto a larger DAGCircuit after routing on only a subset of the Target (such as happens in disjoint-target handling).

New RoutingTarget

The previous (NeighborTable, Graph<(), ()>, ArrayView2<f64>) form of the target specification was strongly influenced by what we could create and pass around from Python, but also had a fair amount of duplication in it. Previously we had the problem that the Rustworkx Python-exposed CouplingMap couldn't be consumed by our Rust passes, so we were creating a NeighborTable from a list of edges, rebuilding a Graph<(), ()> from that, and calculating a distance matrix entirely separately.

Now we're Rust-native, we only want to work with a Target as input.

In the new form: the roles fulfilled by NeighborTable and Graph<(), ()> are combined into a single new Neighbors structure, which then implements all the petgraph::visit traits using PhysicalQubit as the index type, so can be passed natively to dijkstra (for the release valve) and token_swapper (for control-flow layout reversion), and in turn, the result swap paths come out already in terms of PhysicalQubit.

Internally, Neighbors is a flat list [n0_0, n0_1, n1_0, n1_1, n1_2, n2_0, ...], where nX_Y means the Yth neighbor of qubit X, and a separate list of partition points into that flat list. Fundamentally, it is the same as the list-of-lists approach of the old NeighborTable, just flattened so we don't even need SmallVec and its "did this spill to the heap?" runtime check on each access for data locality; the data is simply always localised because it is flat and contiguous. It is similar to the CSR-like structure used by SparseObservable, for example. Neighbors then implements all the suitable petgraph visitor traits as if it were an undirected graph. Actually, it's more like a guaranteed-symmetric directed graph (each edge is duplicated) for performance reasons in swap-candidate enumeration with how Sabre works, but that largely doesn't matter.

The distance matrix is now calculated directly from Neighbors. The relevant method isn't in rustworkx_core yet (see Qiskit/rustworkx#1439), so this patch contains a (re-optimised) port of that method.

@jakelishman jakelishman added this to the 2.1.0 milestone May 7, 2025
@jakelishman jakelishman added performance Rust This PR or issue is related to Rust code in the repository mod: transpiler Issues and PRs related to Transpiler labels May 7, 2025
@coveralls

coveralls commented May 13, 2025

Copy link
Copy Markdown

Pull Request Test Coverage Report for Build 16126424147

Details

  • 944 of 1088 (86.76%) changed or added relevant lines in 10 files are covered.
  • 42 unchanged lines in 10 files lost coverage.
  • Overall coverage decreased (-0.04%) to 87.738%

Changes Missing Coverage Covered Lines Changed/Added Lines %
qiskit/transpiler/passes/routing/sabre_swap.py 27 29 93.1%
crates/transpiler/src/target/mod.rs 18 21 85.71%
qiskit/transpiler/passes/layout/sabre_layout.py 49 54 90.74%
crates/transpiler/src/passes/sabre/dag.rs 97 103 94.17%
crates/transpiler/src/passes/sabre/route.rs 310 333 93.09%
crates/transpiler/src/passes/sabre/layout.rs 263 312 84.29%
crates/transpiler/src/passes/sabre/neighbors.rs 120 176 68.18%
Files with Coverage Reduction New Missed Lines %
crates/qasm2/src/expr.rs 1 93.63%
crates/transpiler/src/passes/unitary_synthesis.rs 1 93.37%
qiskit/transpiler/passes/routing/sabre_swap.py 1 92.31%
crates/circuit/src/circuit_instruction.rs 3 88.22%
crates/circuit/src/dag_node.rs 3 68.79%
crates/transpiler/src/passes/disjoint_layout.rs 3 89.04%
crates/circuit/src/nlayout.rs 6 87.88%
crates/qasm2/src/parse.rs 6 97.09%
crates/circuit/src/dag_circuit.rs 9 85.38%
crates/qasm2/src/lex.rs 9 90.98%
Totals Coverage Status
Change from base Build 16123284064: -0.04%
Covered Lines: 81252
Relevant Lines: 92608

💛 - Coveralls

@jakelishman

This comment was marked as outdated.

mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request May 15, 2025
This commit reverts the move of star prerouting to run in Rust. At the
time of this commit it was before the majority of the transpiler
infrastructure was ported to rust. At that time the pass was ported we
had a few one off passes that were ported to rust and we had to manage
the Python rust boundary carefully. The pattern was typically to define
custom rust data types that provided an alternate view of the
DAGCircuit and the target for compilation that was built out of the
Python space dag and target. The star prerouting pass was ported to
Rust and leveraged the existing infrastructure used by SabreSwap and
SabreLayout as the function of the pass on a DAG was very similar to
Sabre (finding where to insert swaps and rebuilding the dag where
swaps are needed). At the time this was a pragmatic choice to
accelerate the pass and not need to wait for the needed infrastructure
to be written in Rust. However in practice the performance gains the
Rust porting of the algorithm provided were not as large as was
originally hoped and now the code of the pass is tied directly to
the current implementation of sabre. As the sabre interface layer is
being re-written in Qiskit#14317 this is proving to be a blocker for that
effort. This commit reverts the current rust implementation to unblock
the sabre refactoring effort.

It's still definitely worth porting this pass to Rust but we now have
sufficient infrastructure in Rust to write full transpiler passes
without Python. We should follow up this PR and rewrite the star
prerouting pass so that full pass operates completely in rust and
directly leverages the native data structures that exist in Rust now
instead of the hybrid approach that ties the is tightly coupled to the
internals of the sabre pass.

This reverts commit b23c545.
github-merge-queue Bot pushed a commit that referenced this pull request May 16, 2025
* Revert "Port star_preroute to rust"

This commit reverts the move of star prerouting to run in Rust. At the
time of this commit it was before the majority of the transpiler
infrastructure was ported to rust. At that time the pass was ported we
had a few one off passes that were ported to rust and we had to manage
the Python rust boundary carefully. The pattern was typically to define
custom rust data types that provided an alternate view of the
DAGCircuit and the target for compilation that was built out of the
Python space dag and target. The star prerouting pass was ported to
Rust and leveraged the existing infrastructure used by SabreSwap and
SabreLayout as the function of the pass on a DAG was very similar to
Sabre (finding where to insert swaps and rebuilding the dag where
swaps are needed). At the time this was a pragmatic choice to
accelerate the pass and not need to wait for the needed infrastructure
to be written in Rust. However in practice the performance gains the
Rust porting of the algorithm provided were not as large as was
originally hoped and now the code of the pass is tied directly to
the current implementation of sabre. As the sabre interface layer is
being re-written in #14317 this is proving to be a blocker for that
effort. This commit reverts the current rust implementation to unblock
the sabre refactoring effort.

It's still definitely worth porting this pass to Rust but we now have
sufficient infrastructure in Rust to write full transpiler passes
without Python. We should follow up this PR and rewrite the star
prerouting pass so that full pass operates completely in rust and
directly leverages the native data structures that exist in Rust now
instead of the hybrid approach that ties the is tightly coupled to the
internals of the sabre pass.

This reverts commit b23c545.

* Add back reverted alt text for plot in docs
@jakelishman jakelishman force-pushed the sabre/no-py branch 3 times, most recently from e6e50d3 to 0549a6d Compare May 20, 2025 17:31
@jakelishman jakelishman changed the title WIP: Make Sabre Rust-native Make Sabre Rust-native May 20, 2025
@jakelishman jakelishman changed the title Make Sabre Rust-native Rewrite Sabre interfaces to be Rust native May 20, 2025
@jakelishman jakelishman marked this pull request as ready for review May 20, 2025 17:39
@jakelishman jakelishman requested a review from a team as a code owner May 20, 2025 17:39
@qiskit-bot

Copy link
Copy Markdown
Collaborator

One or more of the following people are relevant to this code:

  • @Qiskit/terra-core

@jakelishman

Copy link
Copy Markdown
Member Author

As noted in the now-updated PR comment: there's still some minor work to fix up the last couple of todo! stubs, though I think that mostly they're still todo! because I wasn't immediately certain how they should be handled, and we don't have test-suite coverage of them.

I also haven't run benchmarks on this any time recently. From a now-old run, I know that one of the ASV utility-scale benchmarks shows a regression, but I tracked it down to a miniscule deviation in (I think) floating-point stuff 10k+ steps into the Sabre routing algorithm. Functionally, the only reason it appears at all is because #14244 is not fixed by this PR, and the dodgy extended-set tracking causes the heuristic to be completely borked for huge circuits.

I have a couple of PRs to follow this one that break RNG compatibility. They build on stuff that might seem a bit superfluous in the current PR - the reason is because I wrote way too much at once, and then modified the patch series back into an RNG-compatible patch followed by improvements, so some of the extra structures appear to get added before they're properly taken advantage of, but it was too hard to write another completely new implementation to get rid of them.

@jakelishman jakelishman added the on hold Can not fix yet label May 22, 2025
@jakelishman

Copy link
Copy Markdown
Member Author

Local benchmarking suggests that while this has expected performance improvements in sabre_routing, there's some sort of regression in sabre_layout that needs more investigation.

@jakelishman

Copy link
Copy Markdown
Member Author

No longer "on hold" for performance: the trouble was that this commit forcibly disabled double parallelisation in the routing component of a single layout trial, and on machines where the single layer of parallelism didn't saturate the thread count, that was a regression. The intent had been to reduce parallelism synchronisation overhead, but in practice, each routing run takes long enough, and Rayon is efficient enough under these circumstances, that it's irrelevant.

@jakelishman

Copy link
Copy Markdown
Member Author

Changed benchmarks, running on the 192-thread benchmarking server.

| Change   | Before [22367fca]    | After [a102ff63]    |   Ratio | Benchmark (Parameter)                                                                                   |
|----------|----------------------|---------------------|---------|---------------------------------------------------------------------------------------------------------|
| -        | 2.78±0.04ms          | 2.49±0.06ms         |    0.9  | circuit_construction.CircuitConstructionBench.time_circuit_copy(14, 131072)                             |
| -        | 2.80±0.03ms          | 2.52±0.01ms         |    0.9  | circuit_construction.CircuitConstructionBench.time_circuit_copy(20, 131072)                             |
| -        | 2.80±0.03ms          | 2.49±0.06ms         |    0.89 | circuit_construction.CircuitConstructionBench.time_circuit_copy(5, 131072)                              |
| -        | 2.80±0.02ms          | 2.38±0.05ms         |    0.85 | circuit_construction.CircuitConstructionBench.time_circuit_copy(8, 131072)                              |
| -        | 76.1±1ms             | 29.4±0.9ms          |    0.39 | mapping_passes.PassBenchmarks.time_sabre_swap(14, 1024)                                                 |
| -        | 123±3ms              | 47.8±1ms            |    0.39 | mapping_passes.PassBenchmarks.time_sabre_swap(20, 1024)                                                 |
| -        | 37.1±9ms             | 8.33±1ms            |    0.22 | mapping_passes.PassBenchmarks.time_sabre_swap(5, 1024)                                                  |
| -        | 37.8±0.5μs           | 34.3±0.08μs         |    0.91 | passes.PassBenchmarks.time_remove_diagonal_gates_before_measurement(20, 1024)                           |
| -        | 17.4±0.09μs          | 15.8±0.1μs          |    0.91 | passes.PassBenchmarks.time_remove_diagonal_gates_before_measurement(5, 1024)                            |
| -        | 10.1±0.02s           | 5.61±0.01s          |    0.55 | qft.LargeQFTMappingTimeBench.time_sabre_swap(1081, 'decay')                                             |
| -        | 9.76±0.01s           | 5.22±0s             |    0.53 | qft.LargeQFTMappingTimeBench.time_sabre_swap(1081, 'lookahead')                                         |
| -        | 81.7±10ms            | 33.4±0.1ms          |    0.41 | qft.LargeQFTMappingTimeBench.time_sabre_swap(115, 'decay')                                              |
| -        | 102±0.9ms            | 32.0±0.06ms         |    0.31 | qft.LargeQFTMappingTimeBench.time_sabre_swap(115, 'lookahead')                                          |
| -        | 1.23±0.02s           | 570±3ms             |    0.46 | qft.LargeQFTMappingTimeBench.time_sabre_swap(409, 'decay')                                              |
| -        | 1.23±0.01s           | 548±7ms             |    0.44 | qft.LargeQFTMappingTimeBench.time_sabre_swap(409, 'lookahead')                                          |
| -        | 5.85±0.06ms          | 5.22±0.02ms         |    0.89 | qft.QftTranspileBench.time_ibmq_backend_transpile(1)                                                    |
| -        | 6.85±0.2ms           | 5.99±0.02ms         |    0.88 | qft.QftTranspileBench.time_ibmq_backend_transpile(2)                                                    |
| -        | 2.88±0.01s           | 2.56±0.01s          |    0.89 | quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(1081, 10, 'lookahead')                |
| -        | 20.7±8ms             | 9.54±0.08ms         |    0.46 | quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(115, 10, 'decay')                     |
| -        | 20.3±0.2ms           | 9.30±0.06ms         |    0.46 | quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(115, 10, 'lookahead')                 |
| -        | 182±10ms             | 78.8±1ms            |    0.43 | quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(115, 100, 'decay')                    |
| -        | 178±8ms              | 74.7±0.7ms          |    0.42 | quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(115, 100, 'lookahead')                |
| -        | 276±10ms             | 192±0.2ms           |    0.69 | quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(409, 10, 'decay')                     |
| -        | 404±10ms             | 303±0.6ms           |    0.75 | quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(409, 10, 'lookahead')                 |
| -        | 14.7±4ms             | 6.70±0.02ms         |    0.46 | quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(1, 'synthesis')                       |
| -        | 16.3±2ms             | 5.66±0.02ms         |    0.35 | quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(1, 'translator')                      |
| -        | 16.3±8ms             | 7.45±0.03ms         |    0.46 | quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(2, 'synthesis')                       |
| -        | 8.21±1ms             | 6.44±0.02ms         |    0.78 | quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(2, 'translator')                      |
| -        | 149±5ms              | 124±2ms             |    0.83 | quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(20, 'synthesis')                      |
| -        | 7.42±0.06ms          | 6.52±0.02ms         |    0.88 | queko.QUEKOTranspilerBench.time_transpile_bigd(0, None)                                                 |
| -        | 6.38±3ms             | 5.12±0.01ms         |    0.8  | queko.QUEKOTranspilerBench.time_transpile_bigd(1, None)                                                 |
| -        | 8.25±0.2ms           | 7.11±0.06ms         |    0.86 | queko.QUEKOTranspilerBench.time_transpile_bigd(2, None)                                                 |
| -        | 22.3±8ms             | 10.8±0.04ms         |    0.48 | queko.QUEKOTranspilerBench.time_transpile_bntf(1, None)                                                 |
| -        | 84.8±5ms             | 70.9±0.7ms          |    0.84 | queko.QUEKOTranspilerBench.time_transpile_bntf(2, 'sabre')                                              |
| -        | 19.8±0.5ms           | 17.0±0.05ms         |    0.86 | queko.QUEKOTranspilerBench.time_transpile_bntf(2, None)                                                 |
| -        | 32.7±8ms             | 29.6±0.08ms         |    0.91 | queko.QUEKOTranspilerBench.time_transpile_bntf(3, None)                                                 |
| -        | 109±10ms             | 75.6±8ms            |    0.69 | queko.QUEKOTranspilerBench.time_transpile_bss(0, 'sabre')                                               |
| -        | 74.6±10ms            | 44.9±0.5ms          |    0.6  | queko.QUEKOTranspilerBench.time_transpile_bss(0, None)                                                  |
| -        | 20.2±10ms            | 15.9±0.04ms         |    0.79 | queko.QUEKOTranspilerBench.time_transpile_bss(1, None)                                                  |
| -        | 110±4ms              | 80.3±5ms            |    0.73 | queko.QUEKOTranspilerBench.time_transpile_bss(2, 'sabre')                                               |
| -        | 55.2±1ms             | 28.4±0.09ms         |    0.52 | queko.QUEKOTranspilerBench.time_transpile_bss(2, None)                                                  |
| -        | 211±10ms             | 130±9ms             |    0.62 | queko.QUEKOTranspilerBench.time_transpile_bss(3, 'sabre')                                               |
| -        | 413                  | 323                 |    0.78 | queko.QUEKOTranspilerBench.track_depth_bss_optimal_depth_100(1, 'sabre')                                |
| -        | 293                  | 239                 |    0.82 | queko.QUEKOTranspilerBench.track_depth_bss_optimal_depth_100(3, 'sabre')                                |
| -        | 7.87±0.05ms          | 7.14±0.02ms         |    0.91 | random_circuit_hex.BenchRandomCircuitHex.time_ibmq_backend_transpile(4)                                 |
| -        | 9.49±0.07ms          | 6.66±0.05ms         |    0.7  | ripple_adder.RippleAdderTranspile.time_transpile_square_grid_ripple_adder(10, 0)                        |
| -        | 17.0±0.2ms           | 10.9±0.05ms         |    0.64 | ripple_adder.RippleAdderTranspile.time_transpile_square_grid_ripple_adder(20, 0)                        |
| -        | 105±10ms             | 44.4±5ms            |    0.42 | ripple_adder.RippleAdderTranspile.time_transpile_square_grid_ripple_adder(20, 1)                        |
| -        | 96.4±2ms             | 60.8±5ms            |    0.63 | ripple_adder.RippleAdderTranspile.time_transpile_square_grid_ripple_adder(20, 2)                        |
| -        | 86                   | 78                  |    0.91 | statepreparation.StatePreparationTranspileBench.track_cnot_counts_after_mapping_to_ibmq_16_melbourne(6) |
| -        | 187                  | 161                 |    0.86 | statepreparation.StatePreparationTranspileBench.track_cnot_counts_after_mapping_to_ibmq_16_melbourne(7) |
| -        | 394                  | 342                 |    0.87 | statepreparation.StatePreparationTranspileBench.track_cnot_counts_after_mapping_to_ibmq_16_melbourne(8) |
| -        | 19.7±9ms             | 10.00±0.1ms         |    0.51 | transpiler_benchmarks.TranspilerBenchSuite.time_compile_from_large_qasm                                 |
| -        | 7.19±0.1ms           | 5.73±0.02ms         |    0.8  | transpiler_benchmarks.TranspilerBenchSuite.time_cx_compile                                              |
| -        | 14.4±4ms             | 6.02±0.02ms         |    0.42 | transpiler_benchmarks.TranspilerBenchSuite.time_single_gate_compile                                     |
| -        | 62.9±0.2ms           | 54.2±0.2ms          |    0.86 | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(0)                    |
| -        | 159±2ms              | 137±2ms             |    0.86 | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(1)                    |
| -        | 209±4ms              | 175±2ms             |    0.84 | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(2)                    |
| -        | 42.0±8ms             | 18.3±0.07ms         |    0.44 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(1)                           |
| -        | 30.1±7ms             | 10.9±0.04ms         |    0.36 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(2)                           |
| -        | 24.8±10ms            | 13.3±0.02ms         |    0.54 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(3)                           |
| -        | 160±4ms              | 119±2ms             |    0.75 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(3)                                |
| -        | 728                  | 598                 |    0.82 | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(3)                         |
| -        | 5.41±0.04ms          | 3.84±0.04ms         |    0.71 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_179(0)                     |
| -        | 8.26±0.05ms          | 5.53±0.05ms         |    0.67 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_180(0)                     |
| -        | 71.0±5ms             | 54.0±0.9ms          |    0.76 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_180(2)                     |
| -        | 8.58±0.09ms          | 5.66±0.03ms         |    0.66 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_qft_16(0)                         |
| -        | 25.3±0.7ms           | 7.73±0.03ms         |    0.31 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_qft_16(2)                         |
| -        | 9.94±0.05ms          | 8.58±0.02ms         |    0.86 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_qft_16(3)                         |
| -        | 423±5ms              | 371±3ms             |    0.88 | utility_scale.UtilityScaleBenchmarks.time_qft('cx')                                                     |
| -        | 533±10ms             | 481±1ms             |    0.9  | utility_scale.UtilityScaleBenchmarks.time_qft('cz')                                                     |
| -        | 536±10ms             | 484±2ms             |    0.9  | utility_scale.UtilityScaleBenchmarks.time_qft('ecr')                                                    |
| Change   |   Before [22367fca]  |   After [a102ff63]  |   Ratio | Benchmark (Parameter)                                                                   |
|----------|----------------------|---------------------|---------|-----------------------------------------------------------------------------------------|
| +        |                   54 |                  68 |    1.26 | queko.QUEKOTranspilerBench.track_depth_bigd_optimal_depth_45(1, 'sabre')                |
| +        |                 1209 |                1625 |    1.34 | queko.QUEKOTranspilerBench.track_depth_bntf_optimal_depth_25(0, None)                   |
| +        |                  270 |                 352 |    1.3  | queko.QUEKOTranspilerBench.track_depth_bntf_optimal_depth_25(1, 'sabre')                |
| +        |                  682 |                 760 |    1.11 | queko.QUEKOTranspilerBench.track_depth_bss_optimal_depth_100(0, 'sabre')                |
| +        |                  250 |                 359 |    1.44 | queko.QUEKOTranspilerBench.track_depth_bss_optimal_depth_100(2, 'sabre')                |
| +        |                  660 |                 997 |    1.51 | ripple_adder.RippleAdderTranspile.track_depth_transpile_square_grid_ripple_adder(20, 3) |
| +        |                  819 |                 925 |    1.13 | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(1)         |
| +        |                 2649 |                2982 |    1.13 | utility_scale.UtilityScaleBenchmarks.track_qv_depth('cx')                               |
| +        |                 2649 |                2982 |    1.13 | utility_scale.UtilityScaleBenchmarks.track_qv_depth('cz')                               |
| +        |                 2649 |                2982 |    1.13 | utility_scale.UtilityScaleBenchmarks.track_qv_depth('ecr')                              |

@mtreinish mtreinish added this pull request to the merge queue Jul 7, 2025
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jul 7, 2025
@jakelishman

Copy link
Copy Markdown
Member Author

The logical merge conflict with #14589 should now be resolved.

@mtreinish mtreinish added this pull request to the merge queue Jul 8, 2025
Merged via the queue into Qiskit:main with commit c9ad20c Jul 8, 2025
26 checks passed
@jakelishman jakelishman deleted the sabre/no-py branch July 8, 2025 14:09
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Jul 9, 2025
The recently merged Qiskit#14317 the sabre passes were updated to build a
coupling graph from a target in Rust as part of moving all the passes'
logic into rust. There was a small oversight in the construction of the
coupling graph when handling targets that contain globally defined
operations. Previously, the constructed connectivity graph would treat
the presence of any globally defined operation in the target as meaning
the target had all to all connectivity which isn't always the case. If
the globally defined gate is a 1q operation there could still be a
connectivity graph. Similarly in the presence of > 2q gates in the
target the python version didn't invalidate the construction of a
connectivity graph unless there were no 2q operations. Instead the
connectivity graph was ignoring the multiqubit operations. This commit
updates the logic to match what is done in the Python space
`Target.build_coupling_map()`.
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Jul 22, 2025
The recently merged Qiskit#14317 the sabre passes were updated to build a
coupling graph from a target in Rust as part of moving all the passes'
logic into rust. There was a small oversight in the construction of the
coupling graph when handling targets that contain globally defined
operations. Previously, the constructed connectivity graph would treat
the presence of any globally defined operation in the target as meaning
the target had all to all connectivity which isn't always the case. If
the globally defined gate is a 1q operation there could still be a
connectivity graph. Similarly in the presence of > 2q gates in the
target the python version didn't invalidate the construction of a
connectivity graph unless there were no 2q operations. Instead the
connectivity graph was ignoring the multiqubit operations. This commit
updates the logic to match what is done in the Python space
`Target.build_coupling_map()`.
@raynelfss raynelfss added the Changelog: None Do not include in the GitHub Release changelog. label Jul 28, 2025
github-merge-queue Bot pushed a commit that referenced this pull request Aug 14, 2025
* Correctly handle target with global 1q or 3q gates in Sabre

The recently merged #14317 the sabre passes were updated to build a
coupling graph from a target in Rust as part of moving all the passes'
logic into rust. There was a small oversight in the construction of the
coupling graph when handling targets that contain globally defined
operations. Previously, the constructed connectivity graph would treat
the presence of any globally defined operation in the target as meaning
the target had all to all connectivity which isn't always the case. If
the globally defined gate is a 1q operation there could still be a
connectivity graph. Similarly in the presence of > 2q gates in the
target the python version didn't invalidate the construction of a
connectivity graph unless there were no 2q operations. Instead the
connectivity graph was ignoring the multiqubit operations. This commit
updates the logic to match what is done in the Python space
`Target.build_coupling_map()`.

* Store graph of 2q target component in MultiQ error
littlebullGit pushed a commit to littlebullGit/qiskit that referenced this pull request Sep 5, 2025
…4715)

* Correctly handle target with global 1q or 3q gates in Sabre

The recently merged Qiskit#14317 the sabre passes were updated to build a
coupling graph from a target in Rust as part of moving all the passes'
logic into rust. There was a small oversight in the construction of the
coupling graph when handling targets that contain globally defined
operations. Previously, the constructed connectivity graph would treat
the presence of any globally defined operation in the target as meaning
the target had all to all connectivity which isn't always the case. If
the globally defined gate is a 1q operation there could still be a
connectivity graph. Similarly in the presence of > 2q gates in the
target the python version didn't invalidate the construction of a
connectivity graph unless there were no 2q operations. Instead the
connectivity graph was ignoring the multiqubit operations. This commit
updates the logic to match what is done in the Python space
`Target.build_coupling_map()`.

* Store graph of 2q target component in MultiQ error
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Sep 25, 2025
This commit fixes the support for pickling SabreSwap. In Qiskit#14317 a new
RoutingTarget rust struct was added to encapsulate the target details,
and this was exposed to Python so that the Python class for the
transpiler pass was able to reuse the object between multiple runs.
However, this new type didn't implement pickle support and it would
cause a failure when trying to pickle a SabreSwap instance that had a
routing target populated. This commit fixes this oversight and
implements pickle support for the RoutingTarget so that SabreSwap can
always be pickled.

Fixes Qiskit#15071
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Sep 25, 2025
This commit fixes the support for pickling SabreSwap. In Qiskit#14317 a new
RoutingTarget rust struct was added to encapsulate the target details,
and this was exposed to Python so that the Python class for the
transpiler pass was able to reuse the object between multiple runs.
However, this new type didn't implement pickle support and it would
cause a failure when trying to pickle a SabreSwap instance that had a
routing target populated. This commit fixes this oversight and
implements pickle support for the RoutingTarget so that SabreSwap can
always be pickled.

Fixes Qiskit#15071
github-merge-queue Bot pushed a commit that referenced this pull request Sep 25, 2025
This commit fixes the support for pickling SabreSwap. In #14317 a new
RoutingTarget rust struct was added to encapsulate the target details,
and this was exposed to Python so that the Python class for the
transpiler pass was able to reuse the object between multiple runs.
However, this new type didn't implement pickle support and it would
cause a failure when trying to pickle a SabreSwap instance that had a
routing target populated. This commit fixes this oversight and
implements pickle support for the RoutingTarget so that SabreSwap can
always be pickled.

Fixes #15071
mergify Bot pushed a commit that referenced this pull request Sep 25, 2025
This commit fixes the support for pickling SabreSwap. In #14317 a new
RoutingTarget rust struct was added to encapsulate the target details,
and this was exposed to Python so that the Python class for the
transpiler pass was able to reuse the object between multiple runs.
However, this new type didn't implement pickle support and it would
cause a failure when trying to pickle a SabreSwap instance that had a
routing target populated. This commit fixes this oversight and
implements pickle support for the RoutingTarget so that SabreSwap can
always be pickled.

Fixes #15071

(cherry picked from commit a816e64)
github-merge-queue Bot pushed a commit that referenced this pull request Sep 25, 2025
This commit fixes the support for pickling SabreSwap. In #14317 a new
RoutingTarget rust struct was added to encapsulate the target details,
and this was exposed to Python so that the Python class for the
transpiler pass was able to reuse the object between multiple runs.
However, this new type didn't implement pickle support and it would
cause a failure when trying to pickle a SabreSwap instance that had a
routing target populated. This commit fixes this oversight and
implements pickle support for the RoutingTarget so that SabreSwap can
always be pickled.

Fixes #15071

(cherry picked from commit a816e64)

Co-authored-by: Matthew Treinish <mtreinish@kortar.org>
aaryav-3 pushed a commit to aaryav-3/qiskit that referenced this pull request Oct 21, 2025
* Add further creation and destructuring methods to `NLayout`

These are to allow more efficient Rust-space use of the object, allowing
reallocations to be avoided when doing things like ancilla expansion and
layout transformations at a scale larger than token swapping.

* Rewrite Sabre interfaces to be Rust native

This is a major rewrite of all the Sabre interfaces, though the core
routing logic is unchanged.  Note that the test suite is unaltered.

At least as far as the test suite covers, this change is RNG compatible
with its parent, and in several places, explicit code is added to
achieve this.  Given the neighbour-tracking structures are updated,
there can be some divergences, however, depending on how the order that
a coupling map was defined.

Changes to data structures
==========================

Changed `SabreDAG`
------------------

`SabreDAG` is now completely Rust-internal and not even exposed to
Python.  It's constructed directly from a `DAGCircuit`.  When
control-flow blocks are encountered, the temporary `DAGCircuit` backing
them is stored within the `SabreDAG` node itself, so there's no need for
all the `block_results` side-car handling.

The nodes in the graph explicitly store how the node affects routing:
they can be `TwoQ` (the actual routing target), `Synchronize` (a
non-routing object that has only data dependencies) or `ControlFlow`.
This simplifies the internal handling within the router, since now it
doesn't need to re-infer it.  It also lets us decay control-flow
operations to simple `Synchronize` points when we're in layout mode,
similar to how clearing the `block_nodes` field could implicitly do
before.

A follow-up commit (which necessarily changes the RNG compatibility)
makes stronger use of this new `SabreDAG` form to compress the
interaction graph during construction, reducing the amount of work done
by routing trials when advancing the state after a gate is routed.

New `RoutingProblem`
--------------------

A minor record-like struct that bundles up several related arguments.

New `RoutingTarget`
-------------------

The old Python-space code had a 3-tuple

    (NeighborTable, Graph<(), ()>, ArrayView2<f64>)

to represent the hardware chip the routing was taking place on.  The
first two elements stem from difficulties transferring the Rustworkx
Python-wrapped graph backing `CouplingMap` down to Rust space, and from
trying to faithfully port the existing implementation to Rust.

The `NeighborTable` and the `Graph<(), ()>` functionally serve the same
role; they both relate a node to its neighbors, with one presenting a
list-like interface for efficient iteration, and one presenting a
graph-like interface so graph algorithms like `dijsktra`,
`token_swapper` and `adjacency_matrix` could be called on it.

Now in pure Rust space, both are replaced by a single `Neighbors`
structure, which supplies both interfaces.  It implements all the
relevant `petgraph::visit` traits, so it can be directly passed to
`petgraph` and `rustworkx_core` graph algorithms.  Its data format is a
flat CSR-like sorted list of neighbors. It's logically the same as
`NeighborTable`, except all the `SmallVec`s are flattened into a single
contiguous `Vec` with partition points.

The distance matrix remains the same; we need super fast lookups of
scores because it's in the innermost loop of the swap-mapper.

New `RoutingResult`
-------------------

The new result from `swap_map_trial` is no longer several distinct but
interlinking objects, but an encapsulated `RoutingResult` that borrows
the same data as `RoutingProblem` borrows (the `SabreDAG`, the
`DAGCircuit`, etc), and includes the chosen swaps inline.  It contains
all the information necessary to rebuild a fully routed circuit, but
does not do this until requested.

Implementation changes
======================

Remaining Python-space passes
-----------------------------

`SabreLayout` and `SabreSwap` in Python space mostly just organise the
Python-space inputs into the correct data types for Rust space,
pass everything off, then reconstruct the Python-space objects that
those passes are expected to store in the `PropertySet`.  They contain
next to no business logic themselves.  The exception is that
`SabreLayout` still has its mode for running with a custom routing pass,
which is still all in Python space.

`Target`-native
---------------

Both layout and routing are now `Target`-native.  The Rust methods only
accept a `Target`.  If Python space is supplied with a `CouplingMap`, we
lift it into a dummy `Target` that implies the same connectivity
constraints.  The public attributes of the passes that allowed access to
coupling-map properties are retained, lazily computed on demand, for
those that might still be accessing them.

Disjoint handling
-----------------

The disjoint handling is now done entirely in Rust;
`sabre_layout_and_routing` handles it internally.  This includes full
routing of the DAG, even if it is split across multiple components; the
previous `SabreLayout` implementation bailed out at this stage and left
it to a later `SabreSwap` pass, but it's only a couple more lines of
code to handle in Rust space now.

A follow-up commit may hugely simplify the disjoint handling such that
the DAG never even needs to be split up.

* Safely handle zero-block control-flow ops

Python-space Qiskit doesn't actually allow specifying any control-flow
op with zero blocks right now, but this removes a `todo!` and makes us
safe against that assumption changing in the future.

* Handle case of uninitialized `Target`

* Handle out-of-range checks for partial layouts

* Handle implicit all-to-all targets

This generally does not come up within the preset pass-managers, because
we can skip layout selection if there's all-to-all connectivity.  If
`SabreLayout` is called directly, however, it is still good to have a
valid output.

* Update documentation

Co-authored-by: Matthew Treinish <mtreinish@kortar.org>

* Move layout initialisation out of timing block

* Make bail-out error message more descriptive

---------

Co-authored-by: Matthew Treinish <mtreinish@kortar.org>
aaryav-3 pushed a commit to aaryav-3/qiskit that referenced this pull request Oct 21, 2025
…4715)

* Correctly handle target with global 1q or 3q gates in Sabre

The recently merged Qiskit#14317 the sabre passes were updated to build a
coupling graph from a target in Rust as part of moving all the passes'
logic into rust. There was a small oversight in the construction of the
coupling graph when handling targets that contain globally defined
operations. Previously, the constructed connectivity graph would treat
the presence of any globally defined operation in the target as meaning
the target had all to all connectivity which isn't always the case. If
the globally defined gate is a 1q operation there could still be a
connectivity graph. Similarly in the presence of > 2q gates in the
target the python version didn't invalidate the construction of a
connectivity graph unless there were no 2q operations. Instead the
connectivity graph was ignoring the multiqubit operations. This commit
updates the logic to match what is done in the Python space
`Target.build_coupling_map()`.

* Store graph of 2q target component in MultiQ error
aaryav-3 pushed a commit to aaryav-3/qiskit that referenced this pull request Oct 21, 2025
This commit fixes the support for pickling SabreSwap. In Qiskit#14317 a new
RoutingTarget rust struct was added to encapsulate the target details,
and this was exposed to Python so that the Python class for the
transpiler pass was able to reuse the object between multiple runs.
However, this new type didn't implement pickle support and it would
cause a failure when trying to pickle a SabreSwap instance that had a
routing target populated. This commit fixes this oversight and
implements pickle support for the RoutingTarget so that SabreSwap can
always be pickled.

Fixes Qiskit#15071
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Changelog: None Do not include in the GitHub Release changelog. mod: transpiler Issues and PRs related to Transpiler performance Rust This PR or issue is related to Rust code in the repository

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Port the rest of SabreSwap to Rust Port the rest of SabreLayout to Rust

6 participants