Implement a change in IL API to use RuntimeHelpers.Await<T>(Task<T>) and similar helpers. #2951

VSadov · 2025-01-22T16:12:25Z

This is the actual implementation of what was proposed in dotnet/runtime#110420 and prototyped in #2941

Basically, this changes await marker to be just a call via a special Await helper.

When user writes inside a runtime async method

   int x = await ReturnsTaskOfInt();

C# compiler emits an equivalent of

   int x = Await( ReturnsTaskOfInt() );

The T Await<T>(Task<T> arg) method is a special intrinsic method that performs asynchronous awaiting of the Task<int>.
NOTE: There is no sync-over-async here, Await can optionally suspend/resume the current stack of calls and when the Task<int> is complete, unwraps it and returns int.

Also, the JIT is familiar with the pattern and can further optimize it into call-with-continuation invocation of the runtime-async entry point for ReturnsTaskOfInt().
As a result, if ReturnsTaskOfInt is another runtime-async method, we skip intermediate promise types (Task/ValueTask) entirely, which is the main reason for the performance edge of runtime async over the classic async.

VSadov · 2025-01-22T16:24:37Z

src/tests/async/returns.cs

-            AssertEqual("B", strings.B);
-            AssertEqual("C", strings.C);
-            AssertEqual("D", strings.D);
+            // TODO: need to fix this


@jakobbotsch the change stresses calling via thunks and possibly introduced some scenarios that tests did not cover before. Remarkably, nearly everything works fine!! However, here I saw an assert and turned off one scenario.
Not sure if this is something wrong with IL or something on the JIT side.
(the other disabled case is with thunks for async methods in structs).

I've hit that before when encoding method/type spec tokens incorrectly. Can you verify that the tokens being encoded when we construct the IL for the variants look fine?

The failure does not happen with optimization enabled, so it is not caused by that.
(i.e. by default this test passes, but with set DOTNET_JitOptimizeAwait=0 I see a failure).

I'll try to take a look.

It looks like the async resumption stub in this case needs to box the result, since the result is an obj-containing struct.

We are making a resumption stub for RuntimeHelpers.Await<T>(Task<T>) when T is S<string>, thus the sig reports the result type as S'1<System.__Canon> and we emit Box instruction with that.

I think this is a general issue with shared generics and nongeneric resumption stub that may need to box the result.
I can hit this in shared-generics.cs test by adding scenario like:
(regardless of Await optimization)

Async1EntryPoint<S<string>>(typeof(S<string>), new S<string> {t = "ghj" }).Wait(); . . . struct S<T> { public T t; }

not sure yet how to deal with this.

It does not seem like the fix would be in the scope of this PR.

So can this be uncommented then?

Can you also open an issue about the unhandled case?

logged #3010

jakobbotsch · 2025-01-22T16:40:12Z

The JIT optimization to optimize Await(RuntimeAsyncMethod), which is probably the harder part of the proposal, is not included here.

It would be nice to start on this work to see how it would look before we make the switch. Note that most of the work will be VM work -- teaching getCallInfo implementations to deal with the fact that it now may need to describe a call to the async variant of a call described by a token.

VSadov · 2025-01-23T03:29:16Z

The JIT optimization to optimize Await(RuntimeAsyncMethod), which is probably the harder part of the proposal, is not included here.

It would be nice to start on this work to see how it would look before we make the switch. Note that most of the work will be VM work -- teaching getCallInfo implementations to deal with the fact that it now may need to describe a call to the async variant of a call described by a token.

The optimization would need to detect the following pattern

arg0; .. ; argN; CallToThunkToAsync; CallToAwaitIntrinsic

and turn it into

arg0; .. ; argN; CallToAsync

For that there should be a way to:

detect that a call info is for a thunk to an async method
get a call info for the actual async method (with other inputs being the same)

Is this correct?
Would the following API be sufficient?

for #1, a flag in CORINFO_CALL_INFO::methodFlags indicating that the call info happens to be for a thunk.

CORINFO_FLG_THUNK_TO_ASYNC // the method is a non-async thunk to an async method

for #2 a flag that can be passed in CORINFO_CALLINFO_FLAGS to CEEInfo::getCallInfo, to ask for an actual async method call info.

CORINFO_CALLINFO_UNWRAP_THUNK // assume that the input pResolvedToken is for a thunk (assert if it is not), get the info for the actual async method.

jakobbotsch · 2025-01-23T10:55:03Z

The optimization would need to detect the following pattern

arg0; .. ; argN; CallToThunkToAsync; CallToAwaitIntrinsic

and turn it into

arg0; .. ; argN; CallToAsync

There are a few ways to do this, but maybe the most straightforward will be to do it as a direct IL pattern match at the point where we call getCallInfo:

runtimelab/src/coreclr/jit/importer.cpp

Lines 8952 to 8957 in b077e29

    
           eeGetCallInfo(&resolvedToken, 
        
                         (prefixFlags & PREFIX_CONSTRAINED) ? &constrainedResolvedToken : nullptr, 
        
                         // this is how impImportCall invokes getCallInfo 
        
                         combine(combine(CORINFO_CALLINFO_ALLOWINSTPARAM, CORINFO_CALLINFO_SECURITYCHECKS), 
        
                                 (opcode == CEE_CALLVIRT) ? CORINFO_CALLINFO_CALLVIRT : CORINFO_CALLINFO_NONE), 
        
                         &callInfo);

This would be changed to first look ahead for another call IL instruction and check whether it was a call to RuntimeHelpers.Await. One way to do that is by resolving the next call instruction's token and using isIntrinsic + getMethodNameFromMetadata to check.
You should not need to try to recognize anything about the arguments, I think.

There are some other details to work out, like properly setting up for opportunistic tailcalls when the Await call is in tail position, but that can come later.

For that there should be a way to:

detect that a call info is for a thunk to an async method

get a call info for the actual async method (with other inputs being the same)

Is this correct? Would the following API be sufficient?

for #1, a flag in CORINFO_CALL_INFO::methodFlags indicating that the call info happens to be for a thunk.

CORINFO_FLG_THUNK_TO_ASYNC // the method is a non-async thunk to an async method

for #2 a flag that can be passed in CORINFO_CALLINFO_FLAGS to CEEInfo::getCallInfo, to ask for an actual async method call info.

CORINFO_CALLINFO_UNWRAP_THUNK // assume that the input pResolvedToken is for a thunk (assert if it is not), get the info for the actual async method.

I would skip #1 for now. We can switch any task returning call to its async2 thunk. It is probably more efficient to avoid doing so if we know that we are switching to a thunk, but it is not possible for us to know that statically if the target is dynamically resolved.

#2 is the same as what I was thinking. Without #1 you cannot do the assert, but also it would not be possible to assert this regardless except for statically resolvable cases. Given that I would probably call the flag something like CORINFO_CALLINFO_RUNTIMEASYNC_VARIANT, since we use the "async variant" term in other places.

VSadov · 2025-01-23T20:43:55Z

You should not need to try to recognize anything about the arguments, I think.

Yes. I included the arguments in the example to show that they do not need to change.

I was thinking of looking back at previous instruction once we see an Await intrinsic, and if previous instruction was a call that we can optimize, replace it with a call to async method.
It may be that looking ahead will fit better into how importer does things.

The rest makes sense. Thanks!

VSadov · 2025-01-24T01:10:26Z

Implemented the JIT optimization as discussed above.

…benchmarking purposes)

VSadov · 2025-01-24T03:04:37Z

The impact of the optimization is quite noticeable (as expected):

E:\>set DOTNET_JitOptimizeAwait=0

E:\>E:\A2\runtimelab\artifacts\tests\coreclr\windows.x64.Release\async\fibonacci-without-yields\fibonacci-without-yields.cmd
BEGIN EXECUTION
 "E:\A2\runtimelab\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root\corerun.exe" -p "System.Reflection.Metadata.MetadataUpdater.IsSupported=false" -p "System.Runtime.Serialization.EnableUnsafeBinaryFormatterSerialization=true"  fibonacci-without-yields.dll
1172 ms result=3026313472
Expected: 100
Actual: 100
END EXECUTION - PASSED
PASSED

E:\>set DOTNET_JitOptimizeAwait=1

E:\>E:\A2\runtimelab\artifacts\tests\coreclr\windows.x64.Release\async\fibonacci-without-yields\fibonacci-without-yields.cmd
BEGIN EXECUTION
 "E:\A2\runtimelab\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root\corerun.exe" -p "System.Reflection.Metadata.MetadataUpdater.IsSupported=false" -p "System.Runtime.Serialization.EnableUnsafeBinaryFormatterSerialization=true"  fibonacci-without-yields.dll
178 ms result=3026313472
Expected: 100
Actual: 100
END EXECUTION - PASSED
PASSED

178 ms is definitely an improvement over 1173 ms.

jakobbotsch · 2025-01-24T09:53:19Z

src/coreclr/vm/jitinterface.cpp

+    if (flags & CORINFO_CALLINFO_RUNTIMEASYNC_VARIANT)
+    {
+        _ASSERTE(!pMD->IsAsync2Method());
+        pMD = pMD->GetAsyncOtherVariant();
+        pResolvedToken->hMethod = (CORINFO_METHOD_HANDLE)pMD;
+    }
+


Nice, that's much simpler than I was expecting.

pResolvedToken is in-only, so we should make a copy of it here and change that one instead. If necessary you can update it from the callInfo on the JIT side, but I'm somewhat worried we end up with a token whose fields are internally inconsistent.

Can you make sure we have tests for some of the hard cases? GVMs, interface calls, virtual class calls and constrained calls come to mind. I was expecting shared generics to require more work as well since other fields of the token are used below for those (see ComputeRuntimeLookupForSharedGenericToken). Can you double check why it works out? Is the method spec/type spec ok to reuse as-is from the token?

Another way to ensure the resolved token consistency could be to pass the new flag not to the eeGetCallInfo, but to the impResolveToken.

I've moved the MethodDesc shimming to the level of impResolveToken. That seems nicer as it allows eeGetCallInfo to stay unchanged.

jakobbotsch · 2025-01-28T16:13:12Z

src/coreclr/jit/jitconfigvalues.h

@@ -586,6 +586,8 @@ OPT_CONFIG_INTEGER(JitDoIfConversion, "JitDoIfConversion", 1)
 OPT_CONFIG_INTEGER(JitDoOptimizeMaskConversions, "JitDoOptimizeMaskConversions", 1) // Perform optimization of mask
                                                                                    // conversions

+RELEASE_CONFIG_INTEGER(JitOptimizeAwait, "JitOptimizeAwait", 1) // Perform optimization of Await intrinsics


I wouldn't add a release knob for this.

I have added this knob for two reasons:

we claim the optimization is optional, but I found couple cases which would fail without it. I think the scenarios are reachable in actual code, but might be corner cases and we did not have such code in tests. Disabling the optimization allows to do extra "stress" for calling via thunks.

I wanted to see the impact of the optimizations on benchmarks.

I think both needs are temporary and we will not need the knob in the long run.

I think #2 requires that it is a release knob. Is that correct? Otherwise managed runtime parts will have asserts.
I assumed that is the reason why a bunch of other optimization related knobs are release knobs.

In any case, I`d prefer to log a tracking issue to remove this knob eventually. It seems useful right now, but at some point there will be no need.

Ok, let's have it for now but make sure we remove it before any form of release that would lock its existence in.

https://github.com/dotnet/runtimelab/issues/3012

src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/RuntimeHelpers.cs

Co-authored-by: Jakob Botsch Nielsen <[email protected]>

jakobbotsch · 2025-02-18T09:52:04Z

src/coreclr/jit/importer.cpp

+                        _impResolveToken(CORINFO_TOKENKIND_Await);
+                        // consume the extra call
+                        codeAddr += 1 + sizeof(mdToken);


Does this properly handle the situation of dotnet/roslyn#76872 (comment)?

Good point. It probably does not. I will add a test and make sure this works.

VSadov · 2025-02-19T00:49:43Z

src/tests/async/struct.cs

-        Async().Wait();
+        // TODO: need to fix this
+
+        // Hits an assert around:


This is another case that works with Await folding and trips over if optimization is disabled.

Like with the other TODO, this case can be hit regardless of optimization with directed scenario.
I will uncomment and log an issue.

VSadov · 2025-02-24T03:44:30Z

src/tests/async/awaitingnotasync.cs

+        return arg;
+    }
+
+    // TODO: switch every other scenario to use ValueTask


This is blocked by #3010

VSadov · 2025-02-24T03:46:59Z

src/tests/async/awaitingnotasync.cs

+        // generic identity
+        Assert.Equal(6, await sIdentity(sProp));
+
+        // await(await ...))


this indirectly tests awaiting something that returns T, which just happens to be a task by substitution.
It is ok to use Await in this case now, but will not be optimized into a direct rtAsync call.

We do not need to document this, I think. Perhaps one day we may even optimize this...

Yeah, this would be interesting to optimize via PGO as some form of guarded deabstraction. I have been thinking about the same thing for delegate calls.

jakobbotsch

LGTM!

VSadov · 2025-02-24T20:27:09Z

Thanks!!

…and similar helpers. (dotnet#2951) * T RuntimeHelpers.Await<T>(Task<T>) * state machine version of Await and friends * bump roslyn ref * more Await helpers * remove no longer needed test * undo no longer needed metasig * comment * formatting * comment * implements JIT optimization for Await intrinsics * make the JitOptimizeAwait switch RELEASE_CONFIG_INTEGER (for testing/benchmarking purposes) * isIntrinsic * CORINFO_TOKENKIND_Await * revert CORINFO_CALLINFO_RUNTIMEASYNC_VARIANT * undo unnecessary diff * Apply suggestions from code review Co-authored-by: Jakob Botsch Nielsen <[email protected]> * one more unnecessary `return` * uncomment ReturnsStructGC scenario. * uncomment struct testcase * awaiting things that are not async --------- Co-authored-by: Jakob Botsch Nielsen <[email protected]>

VSadov added 9 commits January 15, 2025 18:27

T RuntimeHelpers.Await<T>(Task<T>)

42ec5c2

state machine version of Await and friends

76ca9da

bump roslyn ref

ed94235

more Await helpers

6e1cd59

remove no longer needed test

5c5566b

undo no longer needed metasig

02de185

comment

a1699c6

formatting

a67f3ea

comment

21dc02b

VSadov requested a review from jakobbotsch January 22, 2025 16:12

VSadov commented Jan 22, 2025

View reviewed changes

VSadov mentioned this pull request Jan 22, 2025

Prototyping T RuntimeHelpers.Await<T>(Task<T>) #2941

Closed

implements JIT optimization for Await intrinsics

4723c6a

VSadov added 2 commits January 23, 2025 18:08

make the JitOptimizeAwait switch RELEASE_CONFIG_INTEGER (for testing/…

066ad44

…benchmarking purposes)

isIntrinsic

0a07d7b

jakobbotsch reviewed Jan 24, 2025

View reviewed changes

VSadov added 3 commits January 24, 2025 09:16

CORINFO_TOKENKIND_Await

03ddebc

revert CORINFO_CALLINFO_RUNTIMEASYNC_VARIANT

ae17a3b

undo unnecessary diff

520da20

VSadov mentioned this pull request Jan 24, 2025

Propose new async API dotnet/runtime#110420

Merged

jakobbotsch reviewed Jan 28, 2025

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/RuntimeHelpers.cs Outdated Show resolved Hide resolved

jakobbotsch reviewed Jan 28, 2025

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/RuntimeHelpers.cs Outdated Show resolved Hide resolved

VSadov commented Feb 14, 2025

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/RuntimeHelpers.cs Outdated Show resolved Hide resolved

VSadov and others added 2 commits February 14, 2025 11:45

Apply suggestions from code review

6266c52

Co-authored-by: Jakob Botsch Nielsen <[email protected]>

one more unnecessary return

c300208

jakobbotsch reviewed Feb 18, 2025

View reviewed changes

VSadov mentioned this pull request Feb 18, 2025

[RuntimeAsync] Problem with boxing in resume stub when return type is a struct instantiated over object type #3010

Closed

uncomment ReturnsStructGC scenario.

a591a63

VSadov commented Feb 19, 2025

View reviewed changes

VSadov added 2 commits February 18, 2025 16:58

uncomment struct testcase

1e71da9

awaiting things that are not async

8ddee8c

VSadov commented Feb 24, 2025

View reviewed changes

jakobbotsch approved these changes Feb 24, 2025

View reviewed changes

VSadov merged commit de4719c into dotnet:feature/async2-experiment Feb 24, 2025
1 of 7 checks passed

VSadov deleted the await1 branch February 24, 2025 20:27

Implement a change in IL API to use RuntimeHelpers.Await<T>(Task<T>) and similar helpers. #2951

Implement a change in IL API to use RuntimeHelpers.Await<T>(Task<T>) and similar helpers. #2951

Uh oh!

Conversation

VSadov commented Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VSadov Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

VSadov Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jakobbotsch commented Jan 22, 2025

Uh oh!

VSadov commented Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jakobbotsch commented Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VSadov commented Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VSadov commented Jan 24, 2025

Uh oh!

VSadov commented Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jakobbotsch Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

VSadov Feb 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

VSadov Feb 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

VSadov commented Jan 22, 2025 •

edited

Loading

VSadov Jan 22, 2025 •

edited

Loading

VSadov Feb 17, 2025 •

edited

Loading

VSadov commented Jan 23, 2025 •

edited

Loading

jakobbotsch commented Jan 23, 2025 •

edited

Loading

VSadov commented Jan 23, 2025 •

edited

Loading

VSadov commented Jan 24, 2025 •

edited

Loading

jakobbotsch Jan 24, 2025 •

edited

Loading

VSadov Feb 14, 2025 •

edited

Loading

VSadov Feb 24, 2025 •

edited

Loading