Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Arm64] Replace pairs of str with stp #85032

Merged
merged 20 commits into from
Apr 27, 2023
Merged

Conversation

kunalspathak
Copy link
Member

Optimize pair of str with stp along with tracking the gc information.

Contributes to #55365 and #77010
Fixes: #35134, #35133 and #81278

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 19, 2023
@ghost
Copy link

ghost commented Apr 19, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

Optimize pair of str with stp along with tracking the gc information.

Contributes to #55365 and #77010
Fixes: #35134, #35133 and #81278

Author: kunalspathak
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@kunalspathak
Copy link
Member Author

Diffs looks good (as expected):

image

I will run gcstress job to make sure there are no gc holes.

@kunalspathak
Copy link
Member Author

/azp run runtime-coreclr gcstress0x3-gcstress0xc

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@kunalspathak
Copy link
Member Author

/azp run runtime-coreclr gcstress0x3-gcstress0xc

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@kunalspathak
Copy link
Member Author

gcstress is green with few known failures.

@kunalspathak kunalspathak marked this pull request as ready for review April 20, 2023 16:40
@kunalspathak
Copy link
Member Author

@dotnet/jit-contrib , @BruceForstall

cc: @a74nh @SwapnilGaikwad

@BruceForstall
Copy link
Member

Diffs

@BruceForstall
Copy link
Member

BruceForstall commented Apr 21, 2023

This diffs look great. Interesting there are some zero-size diffs where we convert str/stp to stp/str.

fwiw, I noticed a case maybe we could do better in libraries.crossgen2 on win-arm64:

16194.dasm - System.Reflection.TypeNameParser:GetType(System.String,bool,bool,System.Reflection.Assembly):System.Type
...
             strb    wzr, [fp, #0x38]
             strb    wzr, [fp, #0x39]
             strb    wzr, [fp, #0x3A]
             strb    wzr, [fp, #0x3B]
             strb    wzr, [fp, #0x3C]

=>

             str     wzr, [fp, #0x38]
             strb    wzr, [fp, #0x3C]

?

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you show any diffs (including GC diffs) where both addresses are GC lcl vars?

src/coreclr/jit/emit.h Show resolved Hide resolved
src/coreclr/jit/emit.h Outdated Show resolved Hide resolved
src/coreclr/jit/emit.h Outdated Show resolved Hide resolved
src/coreclr/jit/emit.h Outdated Show resolved Hide resolved
(immediate).
*/

void emitter::emitIns_SS_R_R_R_I(instruction ins,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not a variant or extension of emitIns_S_S_R_R?

In that case, no reg3 is passed because it is determined based on the lclvar. In this case, there are 2 lclvar. Presumably they would both have the same FPbased value, thus determining the same address base register. In fact, you could probably determine / assert the offsets are correct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have referred the code from emitIns_R_R_R_I() instead which operates on vector registers as well. For example, it operates on scale of 2, 3 and 4, whereas emitIns_S_S_R_R() operates just on scale == 3. Also, it has special handling for useRegForAdr which I assume is just for the stack stores. And yes, the reg3 is always FP or SP.

src/coreclr/jit/emit.h Outdated Show resolved Hide resolved
src/coreclr/jit/emitarm64.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/emitarm64.cpp Outdated Show resolved Hide resolved
@ghost ghost added the needs-author-action An issue or pull request that requires more info or actions from the author. label Apr 21, 2023
@BruceForstall
Copy link
Member

I presume the MinOpts TP regression is due to extra checks related to instrDesc size, and possibly due to fewer constants fitting in a small instrDesc.

@ghost ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Apr 21, 2023
@kunalspathak
Copy link
Member Author

Can you show any diffs (including GC diffs) where both addresses are GC lcl vars?

Sure, this is one example 4997 in coreclr_test.mch of windows.arm64 where I missed the gc tracking, and fixed it in 92b67de.

Here is the diff. Of course, we need to fix #84504
image

@kunalspathak
Copy link
Member Author

I presume the MinOpts TP regression is due to extra checks related to instrDesc size, and possibly due to fewer constants fitting in a small instrDesc.

correct.

@kunalspathak
Copy link
Member Author

fwiw, I noticed a case maybe we could do better in libraries.crossgen2 on win-arm64:

That is an interesting pattern coming from accessing 4~5 bool fields that are next to each other.

private bool _throwOnError;
private bool _ignoreCase;
private bool _extensibleParser;
private bool _requireAssemblyQualifiedName;
private bool _suppressContextualReflectionContext;

Unfortunately, the popular peephole optimization that we do today just check the last instruction. We could use the infrastructure that we have recently developed to see last N instructions and combine them, but honestly, I don't know if removing last N instructions is well tested and works. I would expect this optimization to happen higher up in morph or something, but should be tracked as a separate issue, although I am not sure how many of such patterns exist.

regNumber reg3,
ssize_t imm,
int varx1,
int offs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

offs here is the offset within a local var. Since there are 2 lclvars, there should be two offsets: offs1, offs2.

Copy link
Member Author

@kunalspathak kunalspathak Apr 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2nd offset would be just offs + TARGET_POINTER_SIZE so we don't need to pass it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not correct. The offset is within the local, not from the base register.

src/coreclr/jit/emitarm64.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/emitarm64.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/emitarm64.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/emitarm64.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/emitarm64.cpp Outdated Show resolved Hide resolved
@ghost ghost added needs-author-action An issue or pull request that requires more info or actions from the author. and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Apr 21, 2023
@kunalspathak
Copy link
Member Author

@BruceForstall - this should be ready to review again.

src/coreclr/jit/emitarm64.h Outdated Show resolved Hide resolved
@kunalspathak
Copy link
Member Author

Latest diffs:

image

Seems like some good TPdiffs because of consolidation of ldp and stp instructions:

image

src/coreclr/jit/emitarm64.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/emitarm64.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/emitarm64.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/emitarm64.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/emitarm64.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/emitarm64.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/emitarm64.cpp Show resolved Hide resolved
@ghost ghost added needs-author-action An issue or pull request that requires more info or actions from the author. and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Apr 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ARM64: Optimize pair of "str reg, [fp]" to stp
3 participants