Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement out-of-proc RAR node lifecycle #11383

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

ccastanedaucf
Copy link
Contributor

@ccastanedaucf ccastanedaucf commented Feb 4, 2025

Context

This implements the full node lifecycle and execution flow for an out-of-proc RAR node via an additional nodemode. This does not yet execute the RAR task itself, or serialize anything other than empty dummy payloads - enabling the feature will only run a test message and then fall back to in-proc execution.

Changes Made

New RAR nodemode

  • Setting $env:MSBuildRarNode=1 conditions the RarNodeLauncher to create a new instance of msbuild.exe /nodemode:3. This runs a persistent out-of-proc RAR node which can receive serialized requests from a RAR task client in another process, execute the task, and serialize back the outputs.
  • IPC and serialization logic reuses existing MSBuild node infrastructure.
  • Currently, this does not actually invoke the RAR task, and the request/response objects are empty data structures - but it does implement the full execution and communication flow between the client and server. The only component missing for the feature to be functional (aka executing the RAR task) is populating the request/response with the real RAR inputs/outputs.

Conditioning execution

  • EngineServices.IsOutOfProcRarNodeEnabled tells the RAR task whether to run out-of-proc. This is necessary since we want the task to avoid trying to connect to the node if MSBuild was launched without the flag enabled, which can differ between builds. This is not easily achieved via environment variables due to node reuse.
  • MSBuild property ResolveAssemblyReferencesOutOfProcess can be used to exclude a specific target from running out-of-proc.

RAR Out-Of-Proc Node

  • The out-of-proc node manages two pipe servers - a single-instance server for handling the lifecycle of the node and preventing duplicate nodes from running, and a multi-instance server for executing multiple RAR tasks concurrently.

RAR Out-Of-Proc Client

  • A per-build-node client is shared across all RAR invocations within a build by registering the instance to the host BuildEngine. This ensures that we don't pay overhead cost of reconnecting / repeating handshakes for every task, while still disposing of the client between builds to prevent an idle reused MSBuild node from holding on to a server instance.

Testing

You can validate the feature is functional by setting $env:MSBuildRarNode=1 and checking tracing logs + active processes. Untested code is mostly glue and orchestrating pipe connections. The most brittle code (IPC / serialization) already has test coverage from other MSBuild node implementations.

Notes

  • Env var should be replaced by an actual MSBuild flag in the future, but holding off as the feature is still WIP
  • Need feedback for consistent naming scheme (e.g. is this the RAR out-of-proc service or RAR node? Should RAR be abbreviated everywhere in the node-related types?)

@JanKrivanek
Copy link
Member

Would you be able to separate the pure code moves and actual code changes? It's fine to keep all in a single PR, but separate commits would be great.

It'll help speed up and focus the reviews.

@ccastanedaucf ccastanedaucf force-pushed the dev/chcasta/rar-pr-1 branch 2 times, most recently from e41a063 to 4ba973c Compare February 6, 2025 07:36
@ccastanedaucf
Copy link
Contributor Author

ccastanedaucf commented Feb 21, 2025

FYI I'm separating the code moves (named pipe / IPC stuff) into another PR since I've been further consolidating the IPC code that's duplicated across all node implementations, and that's a bit more risk given the different code paths for the APM and TPL-based versions due to .NET 3.5 compat for the TaskHost. NodeProviderOutOfProcBase, NodeEndpointOutOfProcBase, MSBuildClient, and MSBuildClientPacketPump all use effectively the same logic for setting up the client and server, handshake, sending and receiving packets, serialization, ect., in a way that isn't really reusable via inheritance. Cuts out a pretty big chunk of code from the RAR node implementation too.

Will be updating the main work item with the overall design / ordering of PRs so it helps to see where this is going, since there's still ~5 or so main components after this. But that should be the only real code move / refactoring required for the whole project.

Copy link
Member

@SimaTian SimaTian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to sum up, we're moving the communication stuff into a common section for communicating nodes and then creating a new type of node.
Looks fine to me.

@@ -3420,6 +3426,21 @@ private static void StartLocalNode(CommandLineSwitches commandLineSwitches, bool
OutOfProcTaskHostNode node = new OutOfProcTaskHostNode();
shutdownReason = node.Run(out nodeException);
}
else if (nodeModeNumber == 3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

future proofing: when does it become a good time to switch to a switch, given the increasing number of node types?


internal static NamedPipeServerStream CreateSecurePipeServer(string pipeName, int maxNumberOfServerInstances = 1, int pipeBufferSize = DefaultPipeBufferSize)
{
#if FEATURE_PIPE_SECURITY && FEATURE_NAMED_PIPE_SECURITY_CONSTRUCTOR
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be reasonable to refactor this in a manner that doesn't use interwoven #if directives or would it lead to too many branches? (and if so, maybe some more shuffling could help?)
In an IDE it will probably be somewhat more readable, however as-is it is tad clunky to parse what happens when.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup I'm about to put out a separate PR with the pipe communication refactor on its own. I did a pass through the flags and consolidated it down to FEATURE_PIPEOPTIONS_CURRENTUSERONLY and TASKHOST. The gist is that NETCore has the CurrentUserOnly flag which lets us skip manually validating the user, And since Mono is no long targeted, the Full Framework bits will always run on Windows.

Most of these flags were either redundant or guarding against scenarios no longer supported.

@SimaTian
Copy link
Member

SimaTian commented Mar 4, 2025

There are some minor conflicts, probably due to the perf work from @Erarndt we recently merged.
You have more context than me @ccastanedaucf , can you take a look please?

@ccastanedaucf
Copy link
Contributor Author

Relevant changes are in second commit - I just rebased on the active IPC pr since this depends on it.

@ccastanedaucf
Copy link
Contributor Author

ccastanedaucf commented Mar 19, 2025

Updated PR description to accurately represent changes since all of the IPC code was extracted into #11546

Can ignore the first commit and merge conflicts since it's just a result of copying over #11546 (will rebase once that's merged)

@ccastanedaucf
Copy link
Contributor Author

ccastanedaucf commented Mar 27, 2025

Kk since the IPC stuff got merged, this is also good to go other than this compile error I hit after rebasing:

D:\src\msbuild\rawr-pr-1\src\Shared\NodePipeBase.cs(238,23): error CS0246: The type or namespace name 'ValueTask<>' could not be found (are you missing a using directive or an assembly reference?) [D:\src\msb
uild\rawr-pr-1\src\Tasks\Microsoft.Build.Tasks.csproj::TargetFramework=netstandard2.0]

I think this is from the change which swapped Task for ValueTask in some hot paths.

It seems like referencing System.Threading.Channels pulls in ValueTask correctly (since we don't seem have a ref for System.Threading.Tasks.Extensions), but then I get the additional pre-built error in CI:

.packages/microsoft.dotnet.arcade.sdk/9.0.0-beta.25164.2/tools/SourceBuild/AfterSourceBuild.proj#L81

.packages/microsoft.dotnet.arcade.sdk/9.0.0-beta.25164.2/tools/SourceBuild/AfterSourceBuild.proj(81,5): error : (NETCORE_ENGINEERING_TELEMETRY=AfterSourceBuild) 1 new pre-builts discovered! Detailed usage report can be found at /__w/1/s/artifacts/sb/prebuilt-report/baseline-comparison.xml.
See https://aka.ms/dotnet/prebuilts for guidance on what pre-builts are and how to eliminate them.
Package IDs are:
Microsoft.Bcl.AsyncInterfaces.9.0.0

Anything that can be done to resolve this? Ideally I would have access to Channels here anyways since a future PR is planning to use it for queueing log messages.

Otherwise the simplest thing would to swap NodePipeBase back to using Task to get this unblocked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants