Separate GC WKS and SVR compilation units by janvorli · Pull Request #126720 · dotnet/runtime

janvorli · 2026-04-09T16:52:59Z

Move the GC sources away from the wrapper-file model that text-included gc.cpp and gcee.cpp under SERVER_GC and instead compile the shared sources directly as separate WKS and SVR objects.

This change introduces gcinternal.h as the shared compilation context for the gc.cpp split, converts the former tail-included GC implementation fragments into separately compiled translation units, and updates the GC, VM, NativeAOT, and GC sample build surfaces to consume the new object layout.

It also removes the gcsvr.cpp/gcwks.cpp and gceesvr.cpp/gceewks.cpp wrappers, compiles gcee.cpp through the same dual-build WKS/SVR source lists as gc.cpp, deduplicates the repeated WKS/SVR source lists in the relevant CMake files, and renames the shared GC header from gc_common.h to gcinternal.h to avoid confusion with gccommon.cpp.

During the split, cross-translation-unit declarations and inline helpers needed by multiple GC source files were moved into the shared header, while local-only inline helpers were moved back into their owning .cpp files to avoid keeping unnecessary bodies in the shared header.

I've made size comparison between the new clrgc.dll, clrgcexp.dll and coreclr.dll and the changes in the gc dlls were very minor, around ~1.5kB growth due to little different decisions of the linker / compiler w.r.t. cold / hot code. The coreclr even became ~1.5kB smaller.

dotnet-policy-service · 2026-04-09T16:54:04Z

Tagging subscribers to this area: @agocke, @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Copilot

Pull request overview

This PR refactors CoreCLR GC build plumbing to stop using wrapper translation units (gcwks.cpp/gcsvr.cpp and gceewks.cpp/gceesvr.cpp) and instead compile the shared GC implementation sources directly into separate WKS/SVR object sets, using a new shared compilation-context header gcinternal.h.

Changes:

Introduces gcinternal.h and updates many GC .cpp files to include it and wrap code in WKS/SVR namespaces based on SERVER_GC.
Updates CoreCLR VM and standalone GC CMake build graphs to build GC sources as object libraries for WKS/SVR and consume them from coreclr/clrgc targets.
Updates NativeAOT runtime and the GC sample project build surfaces to consume the new GC source layout.

Reviewed changes

Copilot reviewed 31 out of 33 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/coreclr/vm/wks/CMakeLists.txt	Adds GC object files into `cee_wks_core` build inputs.
src/coreclr/vm/CMakeLists.txt	Defines shared GC source list and builds `vm_gc_wks`/`vm_gc_svr` object libraries.
src/coreclr/dlls/mscoree/coreclr/CMakeLists.txt	Links `vm_gc_wks`/`vm_gc_svr` into `coreclr` and `coreclr_static`.
src/coreclr/nativeaot/Runtime/CMakeLists.txt	Replaces wrapper sources with direct GC `.cpp` compilation for NativeAOT.
src/coreclr/gc/gcinternal.h	New shared GC compilation-context header; centralizes includes and inlines.
src/coreclr/gc/*.cpp	Switches individual GC implementation files to include `gcinternal.h` and wrap in `WKS`/`SVR` namespaces.
src/coreclr/gc/CMakeLists.txt	Builds standalone GC (clrgc/clrgcexp) with WKS/SVR object libraries.
src/coreclr/gc/sample/*	Updates GC sample to compile split GC `.cpp` files directly.

mangod9 · 2026-04-09T17:13:01Z

what is the motivation for this change? Does it improve build times?

janvorli · 2026-04-09T18:12:16Z

what is the motivation for this change? Does it improve build times?

It doesn't affect build time in any way. The main reason is code editing experience. The individual files into which the gc.cpp was split in my recent change didn't include the headers for symbols they use, so when editing one of those e.g. in VS code, it was showing a lot of red squiggles and code navigation didn't work well.

jkotas · 2026-04-10T00:24:43Z

Build breaks....

Copilot

Pull request overview

Copilot reviewed 31 out of 33 changed files in this pull request and generated 1 comment.

Copilot

Pull request overview

Copilot reviewed 31 out of 33 changed files in this pull request and generated 1 comment.

VSadov

LGTM!

MichalStrehovsky · 2026-04-10T22:34:01Z

2% size savings on Hello World on Linux, nice!

Size statistics

Pull request #126720

Project	Size before	Size after	Difference
TodosApi-linux	24545416	24488072	-57344
TodosApi-windows	25923584	25909760	-13824
avalonia.app-linux	18912136	18887560	-24576
avalonia.app-windows	19439616	19433472	-6144
hello-linux	1246088	1221512	-24576
hello-minimal-linux	1098528	1073952	-24576
hello-minimal-windows	773632	766464	-7168
hello-windows	933376	927232	-6144
kestrel-minimal-linux	5354976	5293536	-61440
kestrel-minimal-windows	4854784	4840960	-13824
reflection-linux	1832784	1808208	-24576
reflection-windows	1696256	1689600	-6656
webapiaot-linux	9721456	9660016	-61440
webapiaot-windows	10246144	10232320	-13824
winrt-component-minimal-windows	721408	714240	-7168

jkotas · 2026-04-10T23:32:40Z

2% size savings on Hello World on Linux, nice!

Do we understand why this makes the code smaller? It is likely making it both smaller and slower (less inlining) ... not something we necessarily want for the GC. It may be a good idea to measure the impact on GC throughput, on both Windows and Linux.

This type of refactoring tends to depend on good PGO data and whole program optimizations for good perf:

On Windows, we do not use LGCG and PGO for NAOT to avoid compiler version fragility
On Linux, the infrastructure for collecting PGO data is broken for libcoreclr.so. It is one of the check boxes in Startup time of small workloads #120407 and there was a chat on Teams about that. We may want to frontload fixing it to avoid regressing GC perf on Linux.

janvorli · 2026-04-13T11:21:07Z

Do we understand why this makes the code smaller? It is likely making it both smaller and slower (less inlining) ... not something we necessarily want for the GC. It may be a good idea to measure the impact on GC throughput, on both Windows and Linux.

I definitely want to run the GC perf runs and understand where the size improvement comes from before we merge this.
There were two large functions that were inlined before and that I stopped marking as inlined because I felt like they were too large to make sense inlining them and wanted to verify a perf impact of that. Those were gc_heap::mark_through_cards_helper and gc_heap::set_region_gen_num. Edit: And gc_heap::get_promoted_bytes

janvorli · 2026-04-16T11:46:37Z

@jkotas performance and functionality tests didn't show any regressions.

jkotas · 2026-04-16T12:07:40Z

Have you measured it on a binary that showed the large code size reduction?

(Also, there is a merge conflict that needs to be resolved.)

janvorli · 2026-04-16T17:18:27Z

Have you measured it on a binary that showed the large code size reduction?

It was measured using the dotnet/performance GC tests by vendors, that's the way GC changes have been tested in the past. I am not sure how to perf test the apps that @MichalStrehovsky has mentioned.

VSadov · 2026-04-16T18:30:05Z

performance and functionality tests didn't show any regressions.

I am not very surprised. GC perf is typically dominated by memory accesses (or, more precisely, by cache misses since access patterns could be cache-unfriendly). Code quality may still matter - like inlining of tiny methods in tight per-object loops, but compiler generally knows that too.
This refactoring moves pretty large chunks of code around, so it is more likely not affecting perf in a meaningful way. There is always a chance of some subtle but consequential change, but likelihood seems low.

jkotas · 2026-04-16T18:55:27Z

Code quality may still matter - like inlining of tiny methods in tight per-object loops, but compiler generally knows that too.

Compilers do that reasonably well if they can see the whole program. This change breaks down the GC into multiple compilations units, and the compiler won't the see the whole GC anymore (when compiling for NAOT at least).

jkotas · 2026-04-16T23:32:22Z

What do they run to measure? Is it possible to publish the same test w/ NAOT and run it locally?

MichalStrehovsky · 2026-04-17T01:06:17Z

It was measured using the dotnet/performance GC tests by vendors, that's the way GC changes have been tested in the past. I am not sure how to perf test the apps that @MichalStrehovsky has mentioned.

These are the tests we run in the ASP.NET perf lab: https://aka.ms/aspnet/nativeaot/benchmarks. TodosApi from the table above is the Stage2 app on the benchmarks page. The crank command line used to trigger the run is shown at the bottom of the dashboard. There is a way to send it a custom toolchain or just a custom binary, but I have not used it in years and have no memory of how it was done. https://github.com/aspnet/benchmarks is the entrypoint to all the docs.

We could also just commit and wait for result. The only gotcha is that if there are build breaks or the flow from dotnet/runtime to dotnet/dotnet is stuck, this could be bunched up with a week or two worth of changes and then we'll need to prove it's not caused by this after the fact.

janvorli · 2026-04-17T08:39:34Z

They run aspnetbenchmarks (https://github.com/dotnet/performance/blob/main/src/benchmarks/gc/GC.Infrastructure/Configurations/ASPNetBenchmarks/ASPNetBenchmarks.csv), GCPerfSim benchmarks (https://github.com/dotnet/performance/tree/main/src/benchmarks/gc/GC.Infrastructure/Configurations/GCPerfSim) and a bunch of hand-picked microbenchmarks (https://github.com/dotnet/performance/blob/main/src/benchmarks/gc/GC.Infrastructure/Configurations/Microbenchmark/MicrobenchmarksToRun.txt) from the dotnet/performance repo on Windows.

janvorli · 2026-04-17T08:42:38Z

I can see that on Windows, the clrgcexp.dll is about 5kB smaller and coreclr.dll about 3.5kB smaller.

janvorli · 2026-04-17T08:46:20Z

I have just noticed that the list of sizes that @MichalStrehovsky has shared contains much larger size changes on linux for the same apps, I am going to do some Linux testing and also compare the disassembly of some of the binaries from Michal's list.

janvorli · 2026-04-17T13:45:57Z

Linux shows substantial changes in inlining and also loop unrolling, I need to run the perf tests on Linux too to see the impact.

Move the GC sources away from the wrapper-file model that text-included gc.cpp and gcee.cpp under SERVER_GC and instead compile the shared sources directly as separate WKS and SVR objects. This change introduces gcinternal.h as the shared compilation context for the gc.cpp split, converts the former tail-included GC implementation fragments into separately compiled translation units, and updates the GC, VM, NativeAOT, and GC sample build surfaces to consume the new object layout. It also removes the gcsvr.cpp/gcwks.cpp and gceesvr.cpp/gceewks.cpp wrappers, compiles gcee.cpp through the same dual-build WKS/SVR source lists as gc.cpp, deduplicates the repeated WKS/SVR source lists in the relevant CMake files, and renames the shared GC header from gc_common.h to gcinternal.h to avoid confusion with gccommon.cpp. During the split, cross-translation-unit declarations and inline helpers needed by multiple GC source files were moved into the shared header, while local-only inline helpers were moved back into their owning .cpp files to avoid keeping unnecessary bodies in the shared header. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

janvorli · 2026-06-11T11:14:04Z

I was finally able to verify that it doesn't degrade performance on Linux (I needed to make the GC performance testing framework to work on Linux first).

jkotas · 2026-06-11T22:43:12Z

      target_link_libraries(clrgcexp PRIVATE gcexp_dll_svr_descriptor)
    endif()
    target_compile_definitions(clrgcexp PRIVATE -DUSE_REGIONS)
+    set_property(TARGET clrgcexp PROPERTY INTERPROCEDURAL_OPTIMIZATION TRUE)


Why do we set INTERPROCEDURAL_OPTIMIZATION for some targets and not others?

You are right, it should be added to the clrgc too, I've overlooked it.

Hmm, I also wonder why I haven't used add_pgo instead here. I'll double check that.

So, add_pgo would actually require us to have pgo data for the libcoreclr.so and libcoreclrext.so. We currently generate pgo for libcoreclr.so and libclrjit.so only.
When PGO data is missing, we just fall back to not enabling the LTO at all. The PGO and LTO are tight together, PGO is essentially part of LTO.
There seems to also be an issue with object libraries and PGO. add_pgo only adds the LTO options to the final targets, but on Unix, it needs to be specified at the compilation time too. So any code that we compile as object library and then add to e.g. libcoreclr.so would not have PGO enabled. It seems our PGO optimization is broken as many parts of the coreclr are now object libraries.

Maybe we should enable LTO by default for everything (there is a cmake global property to do that) and change the add_pgo to only add the option to specify the profile file.

enable LTO by default for everything

We cannot enable LTO for the object files and static libraries that we ship. LTO is tied to specific compiler version that is not guaranteed to be on user's machine.

BTW: We do not collect the PGO data for most of libcoreclr.so currently - #128412 is working towards fixing that.

Actually, we may still be able to compile with -flto and get functional shipping static libs, using fat LTO. The benefit would be that for object files that go both to shipping static libs and to our shared libs, we could take advantage of LTO for the shared libs case. See https://llvm.org/docs/FatLTO.html. Supported in LLVM 17 and newer. But that's for a separate experiment.

jkotas · 2026-06-11T22:43:19Z

I was finally able to verify that it doesn't degrade performance on Linux

Have you verified impact on NativeAOT too?

janvorli · 2026-06-11T22:53:33Z

Have you verified impact on NativeAOT too?

I don't know how to do that. The tool we have for testing and comparing GC perf is the GC perf infrastructure tool in dotnet/performance which uses GCPerfSim, selected microbenchmarks and two selected ASPNet benchmarks. It doesn't support NativeAOT.

jkotas · 2026-06-12T03:19:33Z

I don't know how to do that.

Run a program that spends a lot of time in the GC. Here is an example https://gist.github.com/jkotas/dca7f72bcac821af48f387dfebbc63ba . The GC measurements are inherently noisy, so wait for at least 20 iterations until the number starts to stabilize.

On current main, linux-x64 native aot binary size is 1242304 and I see the throughput stabilizing at ~4200ms. With this change (with the exact same main merged), native aot binary size is 1225920 and the throughput stabilizing at ~4300ms.

So there is 16kB less code and 2.4% throughput loss. This program spends about 20% in the GC, so the GC itself is ~12% slower.

Can you try to reproduce these results?

jkotas · 2026-06-12T03:23:41Z

To publish a project with private build of native aot toolchain:

Build the repo using ./build.sh -c release /p:_BuildNativeAOTRuntimePack=true
Add nuget.config to the app that points to your repo build

<configuration>
  <packageSources>
    <add key="my" value="/home/jkotas/runtime/artifacts/packages/Release/Shipping" />
  </packageSources>
</configuration>

Add this to the app project file:

  <ItemGroup>
    <PackageReference Include="Microsoft.DotNet.ILCompiler" Version="11.0.0-dev" />
    <KnownRuntimePack Update="@(KnownRuntimePack)">
      <LatestRuntimeFrameworkVersion Condition="'%(KnownRuntimePack.Identity)' == 'Microsoft.NETCore.App'
      and '%(KnownRuntimePack.RuntimePackLabels)' == 'NativeAOT'
      and '%(KnownRuntimePack.TargetFramework)' == 'net11.0'">11.0.0-dev</LatestRuntimeFrameworkVersion>
    </KnownRuntimePack>
    <KnownILCompilerPack Update="@(KnownILCompilerPack)">
      <ILCompilerPackVersion Condition="'%(KnownILCompilerPack.TargetFramework)' == 'net11.0'">11.0.0-dev</ILCompilerPa>    </KnownILCompilerPack>
  </ItemGroup>

Publish the app using dotnet publish --packages pkg. --packages pkg avoids polluting your global nuget cache with your private build.

am11 · 2026-06-12T13:09:17Z

To publish a project with private build of native aot toolchain:

Documented here https://github.com/dotnet/runtime/blob/main/docs/workflow/building/coreclr/nativeaot.md#building-packages.

jkotas · 2026-06-12T13:17:05Z

Documented here https://github.com/dotnet/runtime/blob/main/docs/workflow/building/coreclr/nativeaot.md#building-packages

These steps are broken in multiple ways - tracked by #119166 .

janvorli · 2026-06-12T17:17:13Z

@jkotas the build ./build.sh -c release /p:_BuildNativeAOTRuntimePack=true keeps consistently failing for me with some weird cpio error, even after git clean -xdf:

  Microsoft.NETCore.App.Ref ->
  Microsoft.NETCore.App.Ref -> /home/janvorli/git/runtime/artifacts/packages/Release/Shipping/dotnet-targeting-pack-11.0.0-dev-x64.deb
  Microsoft.NETCore.App.Ref -> /home/janvorli/git/runtime/artifacts/packages/Release/Shipping/dotnet-targeting-pack-11.0.0-dev-newkey-x64.deb
  /usr/bin/sh: 2: /tmp/MSBuildTemp8RxOst/tmp0de42a70dfe64c8d9106b06a9759e16b.exec.cmd: cpio: Permission denied

This happens for me when building just main without any changes as well as the current PR.

jkotas · 2026-06-12T17:19:23Z

cpio: Permission denied

Do you have cpio prereq? https://github.com/dotnet/runtime/blob/main/docs/workflow/requirements/linux-requirements.md#debian-and-ubuntu

janvorli · 2026-06-12T17:24:06Z

Do you have cpio prereq?

Ah, I didn't have it on that machine, that's probably the culprit

janvorli · 2026-06-12T18:05:41Z

@jkotas on my machine, the diff is 0,7% and that still might be just noise given the variability of the non-averaged results.

janvorli added this to the 11.0.0 milestone Apr 9, 2026

janvorli self-assigned this Apr 9, 2026

janvorli requested a review from MichalStrehovsky as a code owner April 9, 2026 16:53

Copilot AI review requested due to automatic review settings April 9, 2026 16:53

janvorli added the area-GC-coreclr label Apr 9, 2026

Copilot started reviewing on behalf of janvorli April 9, 2026 16:54 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

Comment thread src/coreclr/gc/gcinternal.h

Comment thread src/coreclr/vm/wks/CMakeLists.txt

Comment thread src/coreclr/nativeaot/Runtime/CMakeLists.txt Outdated

janvorli requested review from VSadov and jkotas April 9, 2026 20:43

build-analysis Bot mentioned this pull request Apr 10, 2026

maccatalyst/iossimulator Debug CoreCLR work items fail with PACKAGE_NOT_FOUND (missing Info.plist) #126459

Open

Copilot AI review requested due to automatic review settings April 10, 2026 12:31

Copilot started reviewing on behalf of janvorli April 10, 2026 12:32 View session

Copilot AI reviewed Apr 10, 2026

View reviewed changes

Comment thread src/coreclr/nativeaot/Runtime/CMakeLists.txt Outdated

build-analysis Bot mentioned this pull request Apr 10, 2026

System.Net.NameResolution.Tests DNS failures: Name or service not known #126641

Open

Copilot AI review requested due to automatic review settings April 10, 2026 19:53

Copilot started reviewing on behalf of janvorli April 10, 2026 19:54 View session

Copilot AI reviewed Apr 10, 2026

View reviewed changes

Comment thread src/coreclr/gc/plan_phase.cpp

github-actions Bot mentioned this pull request Apr 10, 2026

126720 MichalStrehovsky/rt-sz#214

Closed

VSadov approved these changes Apr 10, 2026

View reviewed changes

This was referenced Apr 11, 2026

Unable to pull image from mcr.microsoft.com #117164

Open

Multiple Helix work items fail on maccatalyst/tvos CoreCLR Release #126460

Open

janvorli and others added 6 commits June 10, 2026 12:39

Fix build breaks and reflect feedback

5e8b0f5

Fix build break 2

48a85a7

Fix build break 3

957d8f0

Fix nativeaot and build break

5a4d247

Rebase on main and enable LTO/pgo

b5ca3ca

janvorli force-pushed the gc-separate-compilation branch from 86fcd07 to b5ca3ca Compare June 10, 2026 22:07

This was referenced Jun 11, 2026

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

[browser][coreCLR] Wasm.Console.Node.Sample - undefined symbol: SystemInteropJS_GetManagedStackTrace #129229

Open

jkotas reviewed Jun 11, 2026

View reviewed changes

Conversation

janvorli commented Apr 9, 2026

Uh oh!

dotnet-policy-service Bot commented Apr 9, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mangod9 commented Apr 9, 2026

Uh oh!

janvorli commented Apr 9, 2026

Uh oh!

jkotas commented Apr 10, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

VSadov left a comment

Choose a reason for hiding this comment

Uh oh!

MichalStrehovsky commented Apr 10, 2026

Size statistics

Uh oh!

jkotas commented Apr 10, 2026

Uh oh!

janvorli commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

janvorli commented Apr 16, 2026

Uh oh!

jkotas commented Apr 16, 2026

Uh oh!

janvorli commented Apr 16, 2026

Uh oh!

VSadov commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jkotas commented Apr 16, 2026

Uh oh!

jkotas commented Apr 16, 2026

Uh oh!

MichalStrehovsky commented Apr 17, 2026

Uh oh!

janvorli commented Apr 17, 2026

Uh oh!

janvorli commented Apr 17, 2026

Uh oh!

janvorli commented Apr 17, 2026

Uh oh!

janvorli commented Apr 17, 2026

Uh oh!

janvorli commented Jun 11, 2026

Uh oh!

jkotas Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

janvorli Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

janvorli Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

janvorli Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

janvorli Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

janvorli commented Apr 13, 2026 •

edited

Loading

VSadov commented Apr 16, 2026 •

edited

Loading

jkotas commented Jun 12, 2026 •

edited

Loading