-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Arm64] Replace pairs of str
with stp
#85032
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak |
/azp run runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 1 pipeline(s). |
gcstress is green with few known failures. |
@dotnet/jit-contrib , @BruceForstall |
This diffs look great. Interesting there are some zero-size diffs where we convert str/stp to stp/str. fwiw, I noticed a case maybe we could do better in libraries.crossgen2 on win-arm64:
=>
? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you show any diffs (including GC diffs) where both addresses are GC lcl vars?
src/coreclr/jit/emitarm64.cpp
Outdated
(immediate). | ||
*/ | ||
|
||
void emitter::emitIns_SS_R_R_R_I(instruction ins, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not a variant or extension of emitIns_S_S_R_R
?
In that case, no reg3
is passed because it is determined based on the lclvar. In this case, there are 2 lclvar. Presumably they would both have the same FPbased
value, thus determining the same address base register. In fact, you could probably determine / assert the offsets are correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have referred the code from emitIns_R_R_R_I()
instead which operates on vector registers as well. For example, it operates on scale
of 2, 3 and 4, whereas emitIns_S_S_R_R()
operates just on scale == 3
. Also, it has special handling for useRegForAdr
which I assume is just for the stack stores. And yes, the reg3
is always FP
or SP
.
I presume the MinOpts TP regression is due to extra checks related to instrDesc size, and possibly due to fewer constants fitting in a small instrDesc. |
correct. |
That is an interesting pattern coming from accessing 4~5 runtime/src/coreclr/System.Private.CoreLib/src/System/Reflection/TypeNameParser.CoreCLR.cs Lines 20 to 24 in abaad8f
Unfortunately, the popular peephole optimization that we do today just check the last instruction. We could use the infrastructure that we have recently developed to see last N instructions and combine them, but honestly, I don't know if removing last N instructions is well tested and works. I would expect this optimization to happen higher up in morph or something, but should be tracked as a separate issue, although I am not sure how many of such patterns exist. |
src/coreclr/jit/emitarm64.cpp
Outdated
regNumber reg3, | ||
ssize_t imm, | ||
int varx1, | ||
int offs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
offs
here is the offset within a local var. Since there are 2 lclvars, there should be two offsets: offs1
, offs2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 2nd offset
would be just offs + TARGET_POINTER_SIZE
so we don't need to pass it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not correct. The offset is within the local, not from the base register.
@BruceForstall - this should be ready to review again. |
Optimize pair of
str
withstp
along with tracking the gc information.Contributes to #55365 and #77010
Fixes: #35134, #35133 and #81278