Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Added four SVE CreateBreak* APIs #104184

Merged
merged 6 commits into from
Jul 2, 2024

Conversation

TIHan
Copy link
Contributor

@TIHan TIHan commented Jun 29, 2024

Contributes to #99957

Adds:

  • CreateBreakAfterMask
  • CreateBreakAfterPropagateMask
  • CreateBreakBeforeMask
  • CreateBreakBeforePropagateMask

Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

1 similar comment
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

@TIHan
Copy link
Contributor Author

TIHan commented Jun 29, 2024

@dotnet/arm64-contrib @kunalspathak this is ready.

stress test results:

===================Running default===================
------------------- {} -------------------
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_CreateBreakAfterMask_byte() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_CreateBreakAfterMask_ushort() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_CreateBreakAfterMask_uint() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_CreateBreakAfterMask_ulong() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_CreateBreakAfterMask_sbyte() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_CreateBreakAfterMask_short() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_CreateBreakAfterMask_int() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_CreateBreakAfterMask_long() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_CreateBreakAfterPropagateMask_byte() : 22
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_CreateBreakAfterPropagateMask_ushort() : 22
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_CreateBreakAfterPropagateMask_uint() : 22
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_CreateBreakAfterPropagateMask_ulong() : 22
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_CreateBreakAfterPropagateMask_sbyte() : 22
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_CreateBreakAfterPropagateMask_short() : 22
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_CreateBreakAfterPropagateMask_int() : 22
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_CreateBreakAfterPropagateMask_long() : 22
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_CreateBreakBeforeMask_byte() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_CreateBreakBeforeMask_ushort() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_CreateBreakBeforeMask_uint() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_CreateBreakBeforeMask_ulong() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_CreateBreakBeforeMask_sbyte() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_CreateBreakBeforeMask_short() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_CreateBreakBeforeMask_int() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_CreateBreakBeforeMask_long() : 19
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_CreateBreakBeforePropagateMask_byte() : 22
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_CreateBreakBeforePropagateMask_ushort() : 22
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_CreateBreakBeforePropagateMask_uint() : 22
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_CreateBreakBeforePropagateMask_ulong() : 22
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_CreateBreakBeforePropagateMask_sbyte() : 22
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_CreateBreakBeforePropagateMask_short() : 22
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_CreateBreakBeforePropagateMask_int() : 22
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_CreateBreakBeforePropagateMask_long() : 22
===================Running jitstress===================
------------------- {'JitMinOpts': '1'} -------------------
------------------- {'JitStress': '1'} -------------------
------------------- {'JitStress': '2'} -------------------
------------------- {'JitStress': '1', 'TieredCompilation': '1'} -------------------
------------------- {'JitStress': '2', 'TieredCompilation': '1'} -------------------
------------------- {'TailcallStress': '1'} -------------------
------------------- {'ReadyToRun': '0'} -------------------
===================Running jitstressregs===================
------------------- {'JitStressRegs': '1'} -------------------
------------------- {'JitStressRegs': '2'} -------------------
------------------- {'JitStressRegs': '3'} -------------------
------------------- {'JitStressRegs': '4'} -------------------
------------------- {'JitStressRegs': '8'} -------------------
------------------- {'JitStressRegs': '0x10'} -------------------
------------------- {'JitStressRegs': '0x80'} -------------------
------------------- {'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStressRegs': '0x2000'} -------------------
===================Running jitstress2-jitstressregs===================
------------------- {'JitStress': '2', 'JitStressRegs': '1'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '2'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '3'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '4'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '8'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x10'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x80'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x2000'} -------------------

@kunalspathak kunalspathak added the arm-sve Work related to arm64 SVE/SVE2 support label Jun 29, 2024
Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are some considerations to think about the intent of the CreateBreakBeforeMask and CreateBreakAfterMask and should they do merge operation or zero operation.

HARDWARE_INTRINSIC(Sve, CreateBreakAfterMask, -1, 2, true, {INS_sve_brka, INS_sve_brka, INS_sve_brka, INS_sve_brka, INS_sve_brka, INS_sve_brka, INS_sve_brka, INS_sve_brka, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_ExplicitMaskedOperation|HW_Flag_ReturnsPerElementMask|HW_Flag_SpecialCodeGen)
HARDWARE_INTRINSIC(Sve, CreateBreakAfterPropagateMask, -1, 3, true, {INS_sve_brkpa, INS_sve_brkpa, INS_sve_brkpa, INS_sve_brkpa, INS_sve_brkpa, INS_sve_brkpa, INS_sve_brkpa, INS_sve_brkpa, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_ExplicitMaskedOperation|HW_Flag_ReturnsPerElementMask|HW_Flag_SpecialCodeGen)
HARDWARE_INTRINSIC(Sve, CreateBreakBeforeMask, -1, 2, true, {INS_sve_brkb, INS_sve_brkb, INS_sve_brkb, INS_sve_brkb, INS_sve_brkb, INS_sve_brkb, INS_sve_brkb, INS_sve_brkb, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_ExplicitMaskedOperation|HW_Flag_ReturnsPerElementMask|HW_Flag_SpecialCodeGen)
HARDWARE_INTRINSIC(Sve, CreateBreakBeforePropagateMask, -1, 3, true, {INS_sve_brkpb, INS_sve_brkpb, INS_sve_brkpb, INS_sve_brkpb, INS_sve_brkpb, INS_sve_brkpb, INS_sve_brkpb, INS_sve_brkpb, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_ExplicitMaskedOperation|HW_Flag_ReturnsPerElementMask|HW_Flag_SpecialCodeGen)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering why you did not include CreateBreakPropagateMask?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that API needs a closer look as its signature doesn't correspond well to its instruction based on its parameter names and parameter count.

src/coreclr/jit/hwintrinsic.cpp Show resolved Hide resolved
case NI_Sve_CreateBreakAfterMask:
case NI_Sve_CreateBreakBeforeMask:
{
GetEmitter()->emitInsSve_R_R_R(ins, emitSize, targetReg, op1Reg, op2Reg, INS_OPTS_SCALABLE_B);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For BRKB and BRKA, if we do not pass insScalableOpts, it is treated as BRK* Pd.B, Pg/Z, Pn.B. How do we generate the merge variant? If we want to support one of those, should merge be the default one? Accordingly we should also update the summary docs and remove the instruction that we are not supporting. We also need to think about what if all targetReg, op1Reg and op2Reg end up to same register or two of them have same register, does it change the operation of Pg/M vs. Pg/Z. Lastly, we should add test coverage (if not there currently) where we pass same value to mask and srcMask to simulate such behavior.

Copy link
Contributor Author

@TIHan TIHan Jun 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When looking at the original generated API, there were two C intrinsic variants that it listed:
M variant https://dougallj.github.io/asil/doc/brka_p_p_p_m_8.html
Z variant https://dougallj.github.io/asil/doc/brka_p_p_p_z_8.html

Based on the API signature of our intrinsic, it matches exactly to the Z variant - so naturally I think that's most likely the correct one. I figured the M variant that was listed is an artifact of the API generation; as other APIs have before. This also leads me to believe that exposing each variant needs to have their own API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured the M variant that was listed is an artifact of the API generation

Yes, the API generation will have just blindly put them together.

This also leads me to believe that exposing each variant needs to have their own API.

We can we replicate the zero version using:

CndSel(mask, CreateBreakAfterMask(mask, op1), zero)

And then in lowering or somewhere, optimise that to a single BRKA with zeroing.

(Or, if I'm wrong and that doesn't work, then yes, ideally add a extra API)

Doesn't need doing now, but could you raise a ticket please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can we replicate the zero version using:

Do you mean "merge"? The API is the zero version currently

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you raise a ticket please.

+1 on that if we decide to skip a variant of the instruction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TIHan
Copy link
Contributor Author

TIHan commented Jul 1, 2024

@kunalspathak Fixed the comments and merged with main, so it's ready

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@kunalspathak
Copy link
Member

/ba-g failures are unrelated. superpmi diff failures are related to timeout.

@kunalspathak kunalspathak merged commit 7294674 into dotnet:main Jul 2, 2024
158 of 167 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Aug 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants