Simplify discriminant codegen for niche-encoded variants which don't wrap across an integer boundary #143784

scottmcm · 2025-07-11T12:21:20Z

Inspired by #139729, this attempts to be a much-simpler and more-localized change while still making a difference. (Specifically, this does not try to solve the problem with select-sinking, leaving that to be fixed by llvm/llvm-project#134024 -- once it gets released -- instead of in rustc's codegen.)

What this does improve is checking for the variant in a 3+ variant enum when that variant is the type providing the niche. Something like if let Foo::WithBool(_) = ... previously compiled to ugt(add(x, -2), 2), which is non-trivial to think about because it's depending on the unsigned wrapping to shift the 0/1 up above 2. With this PR it compiles to just ult(x, 2), which is probably what you'd have written yourself if you were doing it by hand to look for "is this byte a bool?".

That's done by leaving most of the codegen alone, but adding a couple new special cases to the is_niche check. The default looks at the relative discriminant, but in the common cases where there's no wraparound involved, we can just check the original value, rather than the offsetted one.

The first commit just adds some tests, so the best way to see the effect of this change is to look at the second commit and how it updates the test expectations.

These are from 139729, updated to pass on master.

rustbot · 2025-07-11T12:21:26Z

r? @dianqk

rustbot has assigned @dianqk.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

rustbot · 2025-07-11T12:21:28Z

Some changes occurred in compiler/rustc_codegen_ssa

cc @WaffleLapkin

scottmcm · 2025-07-11T12:26:08Z

tests/codegen/enum/enum-discriminant-eq.rs

    // CHECK: %[[A_REL_DISCR:.+]] = xor i8 %a, -128
-    // CHECK: %[[A_IS_NICHE:.+]] = icmp ult i8 %[[A_REL_DISCR]], 3
+    // CHECK: %[[A_IS_NICHE:.+]] = icmp slt i8 %a, 0


ascii::Char also makes a nice demo here. "Is it one of the variants that's not an ascii::Char" with this PR is phrased as the obvious SLT(a, 0), rather than the previous ULT(XOR(a, -128), 3).

scottmcm · 2025-07-12T23:52:43Z

@bors try @rust-timer queue

bors · 2025-07-12T23:53:54Z

⌛ Trying commit d5bcfb3 with merge 4f0f4c7...

Simplify codegen for niche-encoded variant tests Inspired by #139729, this attempts to be a much-simpler and more-localized change while still making a difference. (Specifically, this does not try to solve the problem with select-sinking, leaving that to be fixed by llvm/llvm-project#134024 -- once it gets released -- instead of in rustc's codegen.) What this *does* improve is checking for the variant in a 3+ variant enum when that variant is the type providing the niche. Something like `if let Foo::WithBool(_) = ...` previously compiled to `ugt(add(x, -2), 2)`, which is non-trivial to think about because it's depending on the unsigned wrapping to shift the 0/1 up above 2. With this PR it compiles to just `ult(x, 2)`, which is probably what you'd have written yourself if you were doing it by hand to look for "is this byte a bool?". That's done by leaving most of the codegen alone, but adding a couple new special cases to the `is_niche` check. The default looks at the relative discriminant, but in the common cases where there's no wraparound involved, we can just check the original value, rather than the offsetted one. The first commit just adds some tests, so the best way to see the effect of this change is to look at the second commit and [how it updates the test expectations](da52d97#diff-14bab05dc3e3448a531a97fafed38bf775095cc68f7997af1721a4c3dc58eb47R218-R223).

bors · 2025-07-13T02:15:02Z

☀️ Try build successful - checks-actions
Build commit: 4f0f4c7 (4f0f4c735637415d37906e84b84beb976c5c6f53)

rust-timer · 2025-07-13T04:23:36Z

Finished benchmarking commit (4f0f4c7): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	0.7%	[0.3%, 0.9%]	4
Regressions ❌ (secondary)	0.5%	[0.2%, 1.8%]	5
Improvements ✅ (primary)	-0.2%	[-0.3%, -0.1%]	17
Improvements ✅ (secondary)	-0.4%	[-1.0%, -0.2%]	16
All ❌✅ (primary)	-0.0%	[-0.3%, 0.9%]	21

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

Results (primary -0.0%, secondary -0.2%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	0.1%	[0.1%, 0.1%]	8
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.2%	[-0.3%, -0.1%]	8
Improvements ✅ (secondary)	-0.2%	[-0.5%, -0.0%]	46
All ❌✅ (primary)	-0.0%	[-0.3%, 0.1%]	16

Bootstrap: 464.174s -> 465.167s (0.21%)
Artifact size: 374.70 MiB -> 374.73 MiB (0.01%)

scottmcm · 2025-07-13T05:19:30Z

At first glance those perf numbers are mixed, but I think they're actually pretty good.

Instructions for check, debug, and doc are all clearly improved, suggesting that the resulting rustc & rustdoc binaries are faster -- well, run fewer instructions -- than before. (The PR doesn't change any logic for check or doc, and actually adds some more ifs during debug codegen.)
The binary size changes are generally positive and actually all in opt builds, so what this is doing isn't something that the optimizer could previously do itself. Percentage-wise they're unsurprisingly small in the primary benches, but hey, I'll take 100K off cargo.
The one crate with primary instruction regressions is hyper in opt, which fits with giving the optimizer more opportunity to improve the code -- especially since it appears to always be a 1-CGU build, magnifying any difference there compared to, say, ripgrep's 16 CGUs.

dianqk · 2025-07-13T05:57:46Z

If you can wait, I can take a look at it this week.

scottmcm · 2025-07-13T19:05:39Z

No rush, @dianqk -- this week would be great.

dianqk

The first commit just adds some tests, so the best way to see the effect of this change is to look at the second commit and how it updates the test expectations.

Nits: This link is no longer valid, so maybe just remove it.

How about we mention non-wrapping ranges in the title?

Then r=me. Nice improvements!

dianqk · 2025-07-15T11:46:36Z

compiler/rustc_codegen_ssa/src/mir/operand.rs

@@ -511,35 +512,44 @@ impl<'a, 'tcx, V: CodegenObject> OperandRef<'tcx, V> {
                    // } else {
                    //     untagged_variant
                    // }
-                    let niche_start = bx.cx().const_uint_big(tag_llty, niche_start);
-                    let is_niche = bx.icmp(IntPredicate::IntEQ, tag, niche_start);
+                    let is_niche = bx.icmp(IntPredicate::IntEQ, tag, niche_start_const);
                    let tagged_discr =
                        bx.cx().const_uint(cast_to, niche_variants.start().as_u32() as u64);
                    (is_niche, tagged_discr, 0)
                } else {
                    // The special cases don't apply, so we'll have to go with
                    // the general algorithm.


Should the comment be repositioned?

dianqk · 2025-07-15T11:50:45Z

compiler/rustc_codegen_ssa/src/mir/operand.rs

-                        let ne = bx.icmp(IntPredicate::IntNE, relative_discr, impossible);
-                        bx.assume(ne);
-                    }
+                    let is_niche = if tag_range.no_unsigned_wraparound(tag_size) == Ok(true) {


Can you add some description of the special case here? Does a diagram like this make sense?

// niche_start niche_end // | | // v v // 0u8----------+--------------------------------+----------255u8 // ^ | is niche | // | +--------------------------------+ // | | // tag_range.start tag_range.end

scottmcm · 2025-07-16T05:37:26Z

@bors r=dianqk

bors · 2025-07-16T05:37:29Z

📌 Commit 4fa23d9 has been approved by dianqk

It is now in the queue for this repository.

bors · 2025-07-16T05:37:30Z

🌲 The tree is currently closed for pull requests below priority 100. This pull request will be tested once the tree is reopened.

More discriminant codegen tests

13b1e40

These are from 139729, updated to pass on master.

rustbot assigned dianqk Jul 11, 2025

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 11, 2025

scottmcm commented Jul 11, 2025

View reviewed changes

scottmcm mentioned this pull request Jul 11, 2025

Allow matching on 3+ variant niche-encoded enums to optimize better #139729

Closed