Skip to content

Conversation

VSadov
Copy link
Member

@VSadov VSadov commented Jan 22, 2025

This is the actual implementation of what was proposed in dotnet/runtime#110420 and prototyped in #2941

Basically, this changes await marker to be just a call via a special Await helper.

When user writes inside a runtime async method

   int x = await ReturnsTaskOfInt();

C# compiler emits an equivalent of

   int x = Await( ReturnsTaskOfInt() );

The T Await<T>(Task<T> arg) method is a special intrinsic method that performs asynchronous awaiting of the Task<int>.
NOTE: There is no sync-over-async here, Await can optionally suspend/resume the current stack of calls and when the Task<int> is complete, unwraps it and returns int.

Also, the JIT is familiar with the pattern and can further optimize it into call-with-continuation invocation of the runtime-async entry point for ReturnsTaskOfInt().
As a result, if ReturnsTaskOfInt is another runtime-async method, we skip intermediate promise types (Task/ValueTask) entirely, which is the main reason for the performance edge of runtime async over the classic async.

@VSadov VSadov requested a review from jakobbotsch January 22, 2025 16:12
AssertEqual("B", strings.B);
AssertEqual("C", strings.C);
AssertEqual("D", strings.D);
// TODO: need to fix this
Copy link
Member Author

@VSadov VSadov Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakobbotsch the change stresses calling via thunks and possibly introduced some scenarios that tests did not cover before. Remarkably, nearly everything works fine!! However, here I saw an assert and turned off one scenario.
Not sure if this is something wrong with IL or something on the JIT side.
(the other disabled case is with thunks for async methods in structs).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've hit that before when encoding method/type spec tokens incorrectly. Can you verify that the tokens being encoded when we construct the IL for the variants look fine?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failure does not happen with optimization enabled, so it is not caused by that.
(i.e. by default this test passes, but with set DOTNET_JitOptimizeAwait=0 I see a failure).

I'll try to take a look.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the async resumption stub in this case needs to box the result, since the result is an obj-containing struct.

We are making a resumption stub for RuntimeHelpers.Await<T>(Task<T>) when T is S<string>, thus the sig reports the result type as S'1<System.__Canon> and we emit Box instruction with that.

Copy link
Member Author

@VSadov VSadov Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a general issue with shared generics and nongeneric resumption stub that may need to box the result.
I can hit this in shared-generics.cs test by adding scenario like:
(regardless of Await optimization)

        Async1EntryPoint<S<string>>(typeof(S<string>), new S<string> {t = "ghj" }).Wait();
. . .

struct S<T>
{
    public T t;
}

not sure yet how to deal with this.

It does not seem like the fix would be in the scope of this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So can this be uncommented then?

Can you also open an issue about the unhandled case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logged #3010

@jakobbotsch
Copy link
Member

The JIT optimization to optimize Await(RuntimeAsyncMethod), which is probably the harder part of the proposal, is not included here.

It would be nice to start on this work to see how it would look before we make the switch. Note that most of the work will be VM work -- teaching getCallInfo implementations to deal with the fact that it now may need to describe a call to the async variant of a call described by a token.

@VSadov
Copy link
Member Author

VSadov commented Jan 23, 2025

The JIT optimization to optimize Await(RuntimeAsyncMethod), which is probably the harder part of the proposal, is not included here.

It would be nice to start on this work to see how it would look before we make the switch. Note that most of the work will be VM work -- teaching getCallInfo implementations to deal with the fact that it now may need to describe a call to the async variant of a call described by a token.

The optimization would need to detect the following pattern

arg0; .. ; argN; CallToThunkToAsync; CallToAwaitIntrinsic

and turn it into

arg0; .. ; argN; CallToAsync

For that there should be a way to:

  1. detect that a call info is for a thunk to an async method
  2. get a call info for the actual async method (with other inputs being the same)

Is this correct?
Would the following API be sufficient?


for #1, a flag in CORINFO_CALL_INFO::methodFlags indicating that the call info happens to be for a thunk.

CORINFO_FLG_THUNK_TO_ASYNC // the method is a non-async thunk to an async method

for #2 a flag that can be passed in CORINFO_CALLINFO_FLAGS to CEEInfo::getCallInfo, to ask for an actual async method call info.

CORINFO_CALLINFO_UNWRAP_THUNK // assume that the input pResolvedToken is for a thunk (assert if it is not), get the info for the actual async method.

@jakobbotsch
Copy link
Member

jakobbotsch commented Jan 23, 2025

The optimization would need to detect the following pattern

arg0; .. ; argN; CallToThunkToAsync; CallToAwaitIntrinsic

and turn it into

arg0; .. ; argN; CallToAsync

There are a few ways to do this, but maybe the most straightforward will be to do it as a direct IL pattern match at the point where we call getCallInfo:

eeGetCallInfo(&resolvedToken,
(prefixFlags & PREFIX_CONSTRAINED) ? &constrainedResolvedToken : nullptr,
// this is how impImportCall invokes getCallInfo
combine(combine(CORINFO_CALLINFO_ALLOWINSTPARAM, CORINFO_CALLINFO_SECURITYCHECKS),
(opcode == CEE_CALLVIRT) ? CORINFO_CALLINFO_CALLVIRT : CORINFO_CALLINFO_NONE),
&callInfo);

This would be changed to first look ahead for another call IL instruction and check whether it was a call to RuntimeHelpers.Await. One way to do that is by resolving the next call instruction's token and using isIntrinsic + getMethodNameFromMetadata to check.
You should not need to try to recognize anything about the arguments, I think.

There are some other details to work out, like properly setting up for opportunistic tailcalls when the Await call is in tail position, but that can come later.

For that there should be a way to:

  1. detect that a call info is for a thunk to an async method
  2. get a call info for the actual async method (with other inputs being the same)

Is this correct? Would the following API be sufficient?

for #1, a flag in CORINFO_CALL_INFO::methodFlags indicating that the call info happens to be for a thunk.

CORINFO_FLG_THUNK_TO_ASYNC // the method is a non-async thunk to an async method

for #2 a flag that can be passed in CORINFO_CALLINFO_FLAGS to CEEInfo::getCallInfo, to ask for an actual async method call info.

CORINFO_CALLINFO_UNWRAP_THUNK // assume that the input pResolvedToken is for a thunk (assert if it is not), get the info for the actual async method.

I would skip #1 for now. We can switch any task returning call to its async2 thunk. It is probably more efficient to avoid doing so if we know that we are switching to a thunk, but it is not possible for us to know that statically if the target is dynamically resolved.

#2 is the same as what I was thinking. Without #1 you cannot do the assert, but also it would not be possible to assert this regardless except for statically resolvable cases. Given that I would probably call the flag something like CORINFO_CALLINFO_RUNTIMEASYNC_VARIANT, since we use the "async variant" term in other places.

@VSadov
Copy link
Member Author

VSadov commented Jan 23, 2025

You should not need to try to recognize anything about the arguments, I think.

Yes. I included the arguments in the example to show that they do not need to change.

I was thinking of looking back at previous instruction once we see an Await intrinsic, and if previous instruction was a call that we can optimize, replace it with a call to async method.
It may be that looking ahead will fit better into how importer does things.

The rest makes sense. Thanks!

@VSadov
Copy link
Member Author

VSadov commented Jan 24, 2025

Implemented the JIT optimization as discussed above.

@VSadov
Copy link
Member Author

VSadov commented Jan 24, 2025

The impact of the optimization is quite noticeable (as expected):

E:\>set DOTNET_JitOptimizeAwait=0

E:\>E:\A2\runtimelab\artifacts\tests\coreclr\windows.x64.Release\async\fibonacci-without-yields\fibonacci-without-yields.cmd
BEGIN EXECUTION
 "E:\A2\runtimelab\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root\corerun.exe" -p "System.Reflection.Metadata.MetadataUpdater.IsSupported=false" -p "System.Runtime.Serialization.EnableUnsafeBinaryFormatterSerialization=true"  fibonacci-without-yields.dll
1172 ms result=3026313472
Expected: 100
Actual: 100
END EXECUTION - PASSED
PASSED

E:\>set DOTNET_JitOptimizeAwait=1

E:\>E:\A2\runtimelab\artifacts\tests\coreclr\windows.x64.Release\async\fibonacci-without-yields\fibonacci-without-yields.cmd
BEGIN EXECUTION
 "E:\A2\runtimelab\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root\corerun.exe" -p "System.Reflection.Metadata.MetadataUpdater.IsSupported=false" -p "System.Runtime.Serialization.EnableUnsafeBinaryFormatterSerialization=true"  fibonacci-without-yields.dll
178 ms result=3026313472
Expected: 100
Actual: 100
END EXECUTION - PASSED
PASSED

178 ms is definitely an improvement over 1173 ms.

Comment on lines 4922 to 4928
if (flags & CORINFO_CALLINFO_RUNTIMEASYNC_VARIANT)
{
_ASSERTE(!pMD->IsAsync2Method());
pMD = pMD->GetAsyncOtherVariant();
pResolvedToken->hMethod = (CORINFO_METHOD_HANDLE)pMD;
}

Copy link
Member

@jakobbotsch jakobbotsch Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, that's much simpler than I was expecting.

pResolvedToken is in-only, so we should make a copy of it here and change that one instead. If necessary you can update it from the callInfo on the JIT side, but I'm somewhat worried we end up with a token whose fields are internally inconsistent.

Can you make sure we have tests for some of the hard cases? GVMs, interface calls, virtual class calls and constrained calls come to mind. I was expecting shared generics to require more work as well since other fields of the token are used below for those (see ComputeRuntimeLookupForSharedGenericToken). Can you double check why it works out? Is the method spec/type spec ok to reuse as-is from the token?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way to ensure the resolved token consistency could be to pass the new flag not to the eeGetCallInfo, but to the impResolveToken.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved the MethodDesc shimming to the level of impResolveToken. That seems nicer as it allows eeGetCallInfo to stay unchanged.

@@ -586,6 +586,8 @@ OPT_CONFIG_INTEGER(JitDoIfConversion, "JitDoIfConversion", 1)
OPT_CONFIG_INTEGER(JitDoOptimizeMaskConversions, "JitDoOptimizeMaskConversions", 1) // Perform optimization of mask
// conversions

RELEASE_CONFIG_INTEGER(JitOptimizeAwait, "JitOptimizeAwait", 1) // Perform optimization of Await intrinsics
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't add a release knob for this.

Copy link
Member Author

@VSadov VSadov Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added this knob for two reasons:

  1. we claim the optimization is optional, but I found couple cases which would fail without it. I think the scenarios are reachable in actual code, but might be corner cases and we did not have such code in tests. Disabling the optimization allows to do extra "stress" for calling via thunks.
  2. I wanted to see the impact of the optimizations on benchmarks.

I think both needs are temporary and we will not need the knob in the long run.

I think #2 requires that it is a release knob. Is that correct? Otherwise managed runtime parts will have asserts.
I assumed that is the reason why a bunch of other optimization related knobs are release knobs.

In any case, I`d prefer to log a tracking issue to remove this knob eventually. It seems useful right now, but at some point there will be no need.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's have it for now but make sure we remove it before any form of release that would lock its existence in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 8988 to 8990
_impResolveToken(CORINFO_TOKENKIND_Await);
// consume the extra call
codeAddr += 1 + sizeof(mdToken);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this properly handle the situation of dotnet/roslyn#76872 (comment)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. It probably does not. I will add a test and make sure this works.

Async().Wait();
// TODO: need to fix this

// Hits an assert around:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another case that works with Await folding and trips over if optimization is disabled.

Like with the other TODO, this case can be hit regardless of optimization with directed scenario.
I will uncomment and log an issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return arg;
}

// TODO: switch every other scenario to use ValueTask
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is blocked by #3010

// generic identity
Assert.Equal(6, await sIdentity(sProp));

// await(await ...))
Copy link
Member Author

@VSadov VSadov Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this indirectly tests awaiting something that returns T, which just happens to be a task by substitution.
It is ok to use Await in this case now, but will not be optimized into a direct rtAsync call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need to document this, I think. Perhaps one day we may even optimize this...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this would be interesting to optimize via PGO as some form of guarded deabstraction. I have been thinking about the same thing for delegate calls.

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@VSadov
Copy link
Member Author

VSadov commented Feb 24, 2025

Thanks!!

@VSadov VSadov merged commit de4719c into dotnet:feature/async2-experiment Feb 24, 2025
1 of 7 checks passed
@VSadov VSadov deleted the await1 branch February 24, 2025 20:27
VSadov added a commit to VSadov/runtimelab that referenced this pull request Mar 15, 2025
…and similar helpers. (dotnet#2951)

* T RuntimeHelpers.Await<T>(Task<T>)

* state machine version of  Await and friends

* bump roslyn ref

* more Await helpers

* remove no longer needed test

* undo no longer needed metasig

* comment

* formatting

* comment

* implements JIT optimization for Await intrinsics

* make the JitOptimizeAwait switch RELEASE_CONFIG_INTEGER  (for testing/benchmarking purposes)

* isIntrinsic

* CORINFO_TOKENKIND_Await

* revert CORINFO_CALLINFO_RUNTIMEASYNC_VARIANT

* undo unnecessary diff

* Apply suggestions from code review

Co-authored-by: Jakob Botsch Nielsen <[email protected]>

* one more unnecessary `return`

* uncomment ReturnsStructGC scenario.

* uncomment struct testcase

* awaiting things that are not async

---------

Co-authored-by: Jakob Botsch Nielsen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants