Skip to content

Simplify discriminant codegen for niche-encoded variants which don't wrap across an integer boundary #143784

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

scottmcm
Copy link
Member

@scottmcm scottmcm commented Jul 11, 2025

Inspired by #139729, this attempts to be a much-simpler and more-localized change while still making a difference. (Specifically, this does not try to solve the problem with select-sinking, leaving that to be fixed by llvm/llvm-project#134024 -- once it gets released -- instead of in rustc's codegen.)

What this does improve is checking for the variant in a 3+ variant enum when that variant is the type providing the niche. Something like if let Foo::WithBool(_) = ... previously compiled to ugt(add(x, -2), 2), which is non-trivial to think about because it's depending on the unsigned wrapping to shift the 0/1 up above 2. With this PR it compiles to just ult(x, 2), which is probably what you'd have written yourself if you were doing it by hand to look for "is this byte a bool?".

That's done by leaving most of the codegen alone, but adding a couple new special cases to the is_niche check. The default looks at the relative discriminant, but in the common cases where there's no wraparound involved, we can just check the original value, rather than the offsetted one.

The first commit just adds some tests, so the best way to see the effect of this change is to look at the second commit and how it updates the test expectations.

These are from 139729, updated to pass on master.
@rustbot
Copy link
Collaborator

rustbot commented Jul 11, 2025

r? @dianqk

rustbot has assigned @dianqk.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 11, 2025
@rustbot
Copy link
Collaborator

rustbot commented Jul 11, 2025

Some changes occurred in compiler/rustc_codegen_ssa

cc @WaffleLapkin

Comment on lines 140 to 141
// CHECK: %[[A_REL_DISCR:.+]] = xor i8 %a, -128
// CHECK: %[[A_IS_NICHE:.+]] = icmp ult i8 %[[A_REL_DISCR]], 3
// CHECK: %[[A_IS_NICHE:.+]] = icmp slt i8 %a, 0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ascii::Char also makes a nice demo here. "Is it one of the variants that's not an ascii::Char" with this PR is phrased as the obvious SLT(a, 0), rather than the previous ULT(XOR(a, -128), 3).

@rust-log-analyzer

This comment has been minimized.

@scottmcm scottmcm force-pushed the enums-again-new-ex2 branch from 7ccf81a to da52d97 Compare July 12, 2025 03:11
@rust-log-analyzer

This comment has been minimized.

@scottmcm scottmcm force-pushed the enums-again-new-ex2 branch from da52d97 to 400c6a7 Compare July 12, 2025 04:57
@rust-log-analyzer

This comment has been minimized.

@scottmcm scottmcm force-pushed the enums-again-new-ex2 branch from 400c6a7 to d5bcfb3 Compare July 12, 2025 11:53
@scottmcm
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 12, 2025
@bors
Copy link
Collaborator

bors commented Jul 12, 2025

⌛ Trying commit d5bcfb3 with merge 4f0f4c7...

bors added a commit that referenced this pull request Jul 12, 2025
Simplify codegen for niche-encoded variant tests

Inspired by #139729, this attempts to be a much-simpler and more-localized change while still making a difference.  (Specifically, this does not try to solve the problem with select-sinking, leaving that to be fixed by llvm/llvm-project#134024 -- once it gets released -- instead of in rustc's codegen.)

What this *does* improve is checking for the variant in a 3+ variant enum when that variant is the type providing the niche.  Something like `if let Foo::WithBool(_) = ...` previously compiled to `ugt(add(x, -2), 2)`, which is non-trivial to think about because it's depending on the unsigned wrapping to shift the 0/1 up above 2.  With this PR it compiles to just `ult(x, 2)`, which is probably what you'd have written yourself if you were doing it by hand to look for "is this byte a bool?".

That's done by leaving most of the codegen alone, but adding a couple new special cases to the `is_niche` check.  The default looks at the relative discriminant, but in the common cases where there's no wraparound involved, we can just check the original value, rather than the offsetted one.

The first commit just adds some tests, so the best way to see the effect of this change is to look at the second commit and [how it updates the test expectations](da52d97#diff-14bab05dc3e3448a531a97fafed38bf775095cc68f7997af1721a4c3dc58eb47R218-R223).
@bors
Copy link
Collaborator

bors commented Jul 13, 2025

☀️ Try build successful - checks-actions
Build commit: 4f0f4c7 (4f0f4c735637415d37906e84b84beb976c5c6f53)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (4f0f4c7): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
0.7% [0.3%, 0.9%] 4
Regressions ❌
(secondary)
0.5% [0.2%, 1.8%] 5
Improvements ✅
(primary)
-0.2% [-0.3%, -0.1%] 17
Improvements ✅
(secondary)
-0.4% [-1.0%, -0.2%] 16
All ❌✅ (primary) -0.0% [-0.3%, 0.9%] 21

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

Results (primary -0.0%, secondary -0.2%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.1% [0.1%, 0.1%] 8
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.2% [-0.3%, -0.1%] 8
Improvements ✅
(secondary)
-0.2% [-0.5%, -0.0%] 46
All ❌✅ (primary) -0.0% [-0.3%, 0.1%] 16

Bootstrap: 464.174s -> 465.167s (0.21%)
Artifact size: 374.70 MiB -> 374.73 MiB (0.01%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jul 13, 2025
@scottmcm
Copy link
Member Author

At first glance those perf numbers are mixed, but I think they're actually pretty good.

  • Instructions for check, debug, and doc are all clearly improved, suggesting that the resulting rustc & rustdoc binaries are faster -- well, run fewer instructions -- than before. (The PR doesn't change any logic for check or doc, and actually adds some more ifs during debug codegen.)
  • The binary size changes are generally positive and actually all in opt builds, so what this is doing isn't something that the optimizer could previously do itself. Percentage-wise they're unsurprisingly small in the primary benches, but hey, I'll take 100K off cargo.
  • The one crate with primary instruction regressions is hyper in opt, which fits with giving the optimizer more opportunity to improve the code -- especially since it appears to always be a 1-CGU build, magnifying any difference there compared to, say, ripgrep's 16 CGUs.

@scottmcm scottmcm added the perf-regression-triaged The performance regression has been triaged. label Jul 13, 2025
@dianqk
Copy link
Member

dianqk commented Jul 13, 2025

If you can wait, I can take a look at it this week.

@scottmcm
Copy link
Member Author

No rush, @dianqk -- this week would be great.

Copy link
Member

@dianqk dianqk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first commit just adds some tests, so the best way to see the effect of this change is to look at the second commit and how it updates the test expectations.

Nits: This link is no longer valid, so maybe just remove it.

How about we mention non-wrapping ranges in the title?

Then r=me. Nice improvements!

@@ -511,35 +512,44 @@ impl<'a, 'tcx, V: CodegenObject> OperandRef<'tcx, V> {
// } else {
// untagged_variant
// }
let niche_start = bx.cx().const_uint_big(tag_llty, niche_start);
let is_niche = bx.icmp(IntPredicate::IntEQ, tag, niche_start);
let is_niche = bx.icmp(IntPredicate::IntEQ, tag, niche_start_const);
let tagged_discr =
bx.cx().const_uint(cast_to, niche_variants.start().as_u32() as u64);
(is_niche, tagged_discr, 0)
} else {
// The special cases don't apply, so we'll have to go with
// the general algorithm.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the comment be repositioned?

let ne = bx.icmp(IntPredicate::IntNE, relative_discr, impossible);
bx.assume(ne);
}
let is_niche = if tag_range.no_unsigned_wraparound(tag_size) == Ok(true) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some description of the special case here? Does a diagram like this make sense?

//         niche_start                       niche_end           
//              |                                |               
//              v                                v               
// 0u8----------+--------------------------------+----------255u8
//         ^    |            is niche            |               
//         |    +--------------------------------+               
//         |                                     |               
// tag_range.start                        tag_range.end          

@scottmcm scottmcm changed the title Simplify codegen for niche-encoded variant tests Simplify discriminant codegen for niche-encoded variants which don't wrap across an integer boundary Jul 15, 2025
@scottmcm
Copy link
Member Author

@bors r=dianqk

@bors
Copy link
Collaborator

bors commented Jul 16, 2025

📌 Commit 4fa23d9 has been approved by dianqk

It is now in the queue for this repository.

@bors
Copy link
Collaborator

bors commented Jul 16, 2025

🌲 The tree is currently closed for pull requests below priority 100. This pull request will be tested once the tree is reopened.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants