Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crossgen2 comparisons are failing in coreclr-outerloop runs #111972

Open
steveisok opened this issue Jan 29, 2025 · 6 comments
Open

Crossgen2 comparisons are failing in coreclr-outerloop runs #111972

steveisok opened this issue Jan 29, 2025 · 6 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI blocking-clean-ci-optional Blocking optional rolling runs
Milestone

Comments

@steveisok
Copy link
Member

steveisok commented Jan 29, 2025

The arm to arm Linux, arm64 to arm64 OSX, and the x86 to x86 Windows comparison legs are failing. This was noticed after the change in #111881 was run to correct infrastructure issues.

Example build: https://dev.azure.com/dnceng-public/public/_build/results?buildId=932682&view=results

arm to arm Linux

Number of omitted results: 0
Number of mismatched results: 33
Total number of files compared: 236
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Jan 29, 2025
Copy link
Contributor

Tagging subscribers to this area: @hoyosjs
See info in area-owners.md if you want to be subscribed.

@steveisok
Copy link
Member Author

/cc @dotnet/jit-contrib

@steveisok steveisok added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed area-Infrastructure-coreclr labels Jan 29, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@AndyAyersMS
Copy link
Member

@steveisok is this known to be a codegen issue? Any details? The logs are not very enlightening.

@jkotas
Copy link
Member

jkotas commented Jan 30, 2025

Yes, it looks like bad non-deterministic codegen from x64-hosted x86-targeting crossgen2 and other similar configuration pairs.
 
This test is verifying crossgen2 determinism. It compiles the same input using x86-hosted x86-targeting crossgen2 and using x64-hosted x86-targeting crossgen2, and expects to get the same result.

Here are the steps to investigate the failure:

  1. Install runfo tool if you have not done that before: dotnet tool install -g runfo

  2. "Test crossgen2-comparison windows x86 Release to x86 windows" looks like the easiest configuration to investigate. Find job GUID at the top of the test log: Console log: 'WorkItem' from job 4beb857f-d6bd-4748-9868-af9265951d0d workitem 5cdd69cb-5d32-4c2c-9bd1-9cb24635eda9 (windows.10.amd64.open.rt) executed on machine a008MFE running Windows-10-10.0.14393-SP0. Download the helix payload for the job: runfo get-helix-payload -j 4beb857f-d6bd-4748-9868-af9265951d0d -o c:\helix_payload

  3. Downloading the helix payload should have printed the repro instructions at the end. Copy&paste them into the console window:

set HELIX_CORRELATION_ID=4beb857f-d6bd-4748-9868-af9265951d0d
set HELIX_CORRELATION_PAYLOAD=c:\helix_payload\correlation-payload
set HELIX_PYTHONPATH=echo skipping python
set HELIX_WORKITEM_FRIENDLYNAME=WorkItem
set HELIX_WORKITEM_ID =WorkItem
set HELIX_WORKITEM_PAYLOAD=c:\helix_payload\workitems\WorkItem
set HELIX_WORKITEM_ROOT=c:\helix_payload\workitems\WorkItem
set HELIX_WORKITEM_UPLOAD_ROOT=c:\helix_payload\workitems\WorkItem
set HELIX_DUMP_FOLDER=c:\helix_payload\workitems\WorkItem
set HELIX_CURRENT_LOG =c:\helix_payload\workitems\WorkItem\log.txt
pushd c:\helix_payload\workitems\WorkItem && %HELIX_CORRELATION_PAYLOAD%\scripts\1aa3249b1d64498881aa4a362b85cb44\execute.cmd && popd

Execution failed with a python error for me. I may have too old or too new python on my machine, not sure. In any case, I have ignored the error since it produced the R2R binaries to look at.
4. Build R2RDump tool or download it from a drop somewhere. I have built mine locally (my enlistment is at c:\runtime).
5. Pick a pair of the non-deterministic R2R compiled .dlls and dump them using R2RDump, with hidden offsets to make the comparison easier (hiding of offsets is not perfect):

c:\runtime\dotnet c:\runtime\artifacts\bin\coreclr\windows.x64.Checked\R2RDump\R2RDump.dll -d --naked --hide-offsets -i c:\helix_payload\workitems\WorkItem\Microsoft.CSharp.ni.dll -o 1.txt
c:\runtime\dotnet c:\runtime\artifacts\bin\coreclr\windows.x64.Checked\R2RDump\R2RDump.dll -d --naked --hide-offsets -i c:\helix_payload\workitems\WorkItem\prebuiltWork\log\Microsoft.CSharp.ni.dll -o 2.txt
  1. Compare 1.txt and 2.txt using your favorite diff tool. You should find several actual codegen diffs. For example, the code for Microsoft.CSharp.RuntimeBinder.Semantics.TypeArray Microsoft.CSharp.RuntimeBinder.SymbolTable.CreateParameterArray(System.Reflection.MemberInfo, System.Reflection.ParameterInfo[]) method differs. The problem is that movzx ebx, bl instruction is emitted twice in one case that looks wrong:

Size: 304 bytes

...
    setne   bl
    movzx   ebx, bl
    mov     dword ptr [ebp - 16], ebx
...

vs.

Size: 307 bytes

...
    setne   bl
    movzx   ebx, bl
    movzx   ebx, bl
    mov     dword ptr [ebp - 16], ebx
...

There are other similar diffs where the emitted instructions is doubled for some reason.

@jkotas jkotas added the blocking-clean-ci-optional Blocking optional rolling runs label Jan 30, 2025
@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Jan 30, 2025
@JulieLeeMSFT JulieLeeMSFT added this to the 10.0.0 milestone Jan 30, 2025
@AndyAyersMS
Copy link
Member

Thanks Jan, will take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI blocking-clean-ci-optional Blocking optional rolling runs
Projects
Status: No status
Development

No branches or pull requests

4 participants