Skip to content

Separate GC WKS and SVR compilation units#126720

Open
janvorli wants to merge 6 commits into
dotnet:mainfrom
janvorli:gc-separate-compilation
Open

Separate GC WKS and SVR compilation units#126720
janvorli wants to merge 6 commits into
dotnet:mainfrom
janvorli:gc-separate-compilation

Conversation

@janvorli

@janvorli janvorli commented Apr 9, 2026

Copy link
Copy Markdown
Member

Move the GC sources away from the wrapper-file model that text-included gc.cpp and gcee.cpp under SERVER_GC and instead compile the shared sources directly as separate WKS and SVR objects.

This change introduces gcinternal.h as the shared compilation context for the gc.cpp split, converts the former tail-included GC implementation fragments into separately compiled translation units, and updates the GC, VM, NativeAOT, and GC sample build surfaces to consume the new object layout.

It also removes the gcsvr.cpp/gcwks.cpp and gceesvr.cpp/gceewks.cpp wrappers, compiles gcee.cpp through the same dual-build WKS/SVR source lists as gc.cpp, deduplicates the repeated WKS/SVR source lists in the relevant CMake files, and renames the shared GC header from gc_common.h to gcinternal.h to avoid confusion with gccommon.cpp.

During the split, cross-translation-unit declarations and inline helpers needed by multiple GC source files were moved into the shared header, while local-only inline helpers were moved back into their owning .cpp files to avoid keeping unnecessary bodies in the shared header.

I've made size comparison between the new clrgc.dll, clrgcexp.dll and coreclr.dll and the changes in the gc dlls were very minor, around ~1.5kB growth due to little different decisions of the linker / compiler w.r.t. cold / hot code. The coreclr even became ~1.5kB smaller.

@janvorli janvorli added this to the 11.0.0 milestone Apr 9, 2026
@janvorli janvorli self-assigned this Apr 9, 2026
Copilot AI review requested due to automatic review settings April 9, 2026 16:53
@dotnet-policy-service

Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @agocke, @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors CoreCLR GC build plumbing to stop using wrapper translation units (gcwks.cpp/gcsvr.cpp and gceewks.cpp/gceesvr.cpp) and instead compile the shared GC implementation sources directly into separate WKS/SVR object sets, using a new shared compilation-context header gcinternal.h.

Changes:

  • Introduces gcinternal.h and updates many GC .cpp files to include it and wrap code in WKS/SVR namespaces based on SERVER_GC.
  • Updates CoreCLR VM and standalone GC CMake build graphs to build GC sources as object libraries for WKS/SVR and consume them from coreclr/clrgc targets.
  • Updates NativeAOT runtime and the GC sample project build surfaces to consume the new GC source layout.

Reviewed changes

Copilot reviewed 31 out of 33 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/coreclr/vm/wks/CMakeLists.txt Adds GC object files into cee_wks_core build inputs.
src/coreclr/vm/CMakeLists.txt Defines shared GC source list and builds vm_gc_wks/vm_gc_svr object libraries.
src/coreclr/dlls/mscoree/coreclr/CMakeLists.txt Links vm_gc_wks/vm_gc_svr into coreclr and coreclr_static.
src/coreclr/nativeaot/Runtime/CMakeLists.txt Replaces wrapper sources with direct GC .cpp compilation for NativeAOT.
src/coreclr/gc/gcinternal.h New shared GC compilation-context header; centralizes includes and inlines.
src/coreclr/gc/*.cpp Switches individual GC implementation files to include gcinternal.h and wrap in WKS/SVR namespaces.
src/coreclr/gc/CMakeLists.txt Builds standalone GC (clrgc/clrgcexp) with WKS/SVR object libraries.
src/coreclr/gc/sample/* Updates GC sample to compile split GC .cpp files directly.

Comment thread src/coreclr/gc/gcinternal.h
Comment thread src/coreclr/vm/wks/CMakeLists.txt
Comment thread src/coreclr/nativeaot/Runtime/CMakeLists.txt Outdated
@mangod9

mangod9 commented Apr 9, 2026

Copy link
Copy Markdown
Member

what is the motivation for this change? Does it improve build times?

@janvorli

janvorli commented Apr 9, 2026

Copy link
Copy Markdown
Member Author

what is the motivation for this change? Does it improve build times?

It doesn't affect build time in any way. The main reason is code editing experience. The individual files into which the gc.cpp was split in my recent change didn't include the headers for symbols they use, so when editing one of those e.g. in VS code, it was showing a lot of red squiggles and code navigation didn't work well.

@jkotas

jkotas commented Apr 10, 2026

Copy link
Copy Markdown
Member

Build breaks....

Copilot AI review requested due to automatic review settings April 10, 2026 12:31

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 31 out of 33 changed files in this pull request and generated 1 comment.

Comment thread src/coreclr/nativeaot/Runtime/CMakeLists.txt Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 31 out of 33 changed files in this pull request and generated 1 comment.

Comment thread src/coreclr/gc/plan_phase.cpp

@VSadov VSadov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@MichalStrehovsky

Copy link
Copy Markdown
Member

2% size savings on Hello World on Linux, nice!

Size statistics

Pull request #126720

Project Size before Size after Difference
TodosApi-linux 24545416 24488072 -57344
TodosApi-windows 25923584 25909760 -13824
avalonia.app-linux 18912136 18887560 -24576
avalonia.app-windows 19439616 19433472 -6144
hello-linux 1246088 1221512 -24576
hello-minimal-linux 1098528 1073952 -24576
hello-minimal-windows 773632 766464 -7168
hello-windows 933376 927232 -6144
kestrel-minimal-linux 5354976 5293536 -61440
kestrel-minimal-windows 4854784 4840960 -13824
reflection-linux 1832784 1808208 -24576
reflection-windows 1696256 1689600 -6656
webapiaot-linux 9721456 9660016 -61440
webapiaot-windows 10246144 10232320 -13824
winrt-component-minimal-windows 721408 714240 -7168

@jkotas

jkotas commented Apr 10, 2026

Copy link
Copy Markdown
Member

2% size savings on Hello World on Linux, nice!

Do we understand why this makes the code smaller? It is likely making it both smaller and slower (less inlining) ... not something we necessarily want for the GC. It may be a good idea to measure the impact on GC throughput, on both Windows and Linux.

This type of refactoring tends to depend on good PGO data and whole program optimizations for good perf:

  • On Windows, we do not use LGCG and PGO for NAOT to avoid compiler version fragility
  • On Linux, the infrastructure for collecting PGO data is broken for libcoreclr.so. It is one of the check boxes in Startup time of small workloads #120407 and there was a chat on Teams about that. We may want to frontload fixing it to avoid regressing GC perf on Linux.

@janvorli

janvorli commented Apr 13, 2026

Copy link
Copy Markdown
Member Author

Do we understand why this makes the code smaller? It is likely making it both smaller and slower (less inlining) ... not something we necessarily want for the GC. It may be a good idea to measure the impact on GC throughput, on both Windows and Linux.

I definitely want to run the GC perf runs and understand where the size improvement comes from before we merge this.
There were two large functions that were inlined before and that I stopped marking as inlined because I felt like they were too large to make sense inlining them and wanted to verify a perf impact of that. Those were gc_heap::mark_through_cards_helper and gc_heap::set_region_gen_num. Edit: And gc_heap::get_promoted_bytes

@janvorli

Copy link
Copy Markdown
Member Author

@jkotas performance and functionality tests didn't show any regressions.

@jkotas

jkotas commented Apr 16, 2026

Copy link
Copy Markdown
Member

Have you measured it on a binary that showed the large code size reduction?

(Also, there is a merge conflict that needs to be resolved.)

@janvorli

Copy link
Copy Markdown
Member Author

Have you measured it on a binary that showed the large code size reduction?

It was measured using the dotnet/performance GC tests by vendors, that's the way GC changes have been tested in the past. I am not sure how to perf test the apps that @MichalStrehovsky has mentioned.

@VSadov

VSadov commented Apr 16, 2026

Copy link
Copy Markdown
Member

performance and functionality tests didn't show any regressions.

I am not very surprised. GC perf is typically dominated by memory accesses (or, more precisely, by cache misses since access patterns could be cache-unfriendly). Code quality may still matter - like inlining of tiny methods in tight per-object loops, but compiler generally knows that too.
This refactoring moves pretty large chunks of code around, so it is more likely not affecting perf in a meaningful way. There is always a chance of some subtle but consequential change, but likelihood seems low.

@jkotas

jkotas commented Apr 16, 2026

Copy link
Copy Markdown
Member

Code quality may still matter - like inlining of tiny methods in tight per-object loops, but compiler generally knows that too.

Compilers do that reasonably well if they can see the whole program. This change breaks down the GC into multiple compilations units, and the compiler won't the see the whole GC anymore (when compiling for NAOT at least).

@jkotas

jkotas commented Apr 16, 2026

Copy link
Copy Markdown
Member

What do they run to measure? Is it possible to publish the same test w/ NAOT and run it locally?

@MichalStrehovsky

Copy link
Copy Markdown
Member

It was measured using the dotnet/performance GC tests by vendors, that's the way GC changes have been tested in the past. I am not sure how to perf test the apps that @MichalStrehovsky has mentioned.

These are the tests we run in the ASP.NET perf lab: https://aka.ms/aspnet/nativeaot/benchmarks. TodosApi from the table above is the Stage2 app on the benchmarks page. The crank command line used to trigger the run is shown at the bottom of the dashboard. There is a way to send it a custom toolchain or just a custom binary, but I have not used it in years and have no memory of how it was done. https://github.com/aspnet/benchmarks is the entrypoint to all the docs.

We could also just commit and wait for result. The only gotcha is that if there are build breaks or the flow from dotnet/runtime to dotnet/dotnet is stuck, this could be bunched up with a week or two worth of changes and then we'll need to prove it's not caused by this after the fact.

@janvorli

Copy link
Copy Markdown
Member Author

I can see that on Windows, the clrgcexp.dll is about 5kB smaller and coreclr.dll about 3.5kB smaller.

@janvorli

Copy link
Copy Markdown
Member Author

I have just noticed that the list of sizes that @MichalStrehovsky has shared contains much larger size changes on linux for the same apps, I am going to do some Linux testing and also compare the disassembly of some of the binaries from Michal's list.

@janvorli

Copy link
Copy Markdown
Member Author

Linux shows substantial changes in inlining and also loop unrolling, I need to run the perf tests on Linux too to see the impact.

janvorli and others added 6 commits June 10, 2026 12:39
Move the GC sources away from the wrapper-file model that text-included gc.cpp and gcee.cpp under SERVER_GC and instead compile the shared sources directly as separate WKS and SVR objects.

This change introduces gcinternal.h as the shared compilation context for the gc.cpp split, converts the former tail-included GC implementation fragments into separately compiled translation units, and updates the GC, VM, NativeAOT, and GC sample build surfaces to consume the new object layout.

It also removes the gcsvr.cpp/gcwks.cpp and gceesvr.cpp/gceewks.cpp wrappers, compiles gcee.cpp through the same dual-build WKS/SVR source lists as gc.cpp, deduplicates the repeated WKS/SVR source lists in the relevant CMake files, and renames the shared GC header from gc_common.h to gcinternal.h to avoid confusion with gccommon.cpp.

During the split, cross-translation-unit declarations and inline helpers needed by multiple GC source files were moved into the shared header, while local-only inline helpers were moved back into their owning .cpp files to avoid keeping unnecessary bodies in the shared header.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@janvorli

Copy link
Copy Markdown
Member Author

I was finally able to verify that it doesn't degrade performance on Linux (I needed to make the GC performance testing framework to work on Linux first).

target_link_libraries(clrgcexp PRIVATE gcexp_dll_svr_descriptor)
endif()
target_compile_definitions(clrgcexp PRIVATE -DUSE_REGIONS)
set_property(TARGET clrgcexp PROPERTY INTERPROCEDURAL_OPTIMIZATION TRUE)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we set INTERPROCEDURAL_OPTIMIZATION for some targets and not others?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, it should be added to the clrgc too, I've overlooked it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I also wonder why I haven't used add_pgo instead here. I'll double check that.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, add_pgo would actually require us to have pgo data for the libcoreclr.so and libcoreclrext.so. We currently generate pgo for libcoreclr.so and libclrjit.so only.
When PGO data is missing, we just fall back to not enabling the LTO at all. The PGO and LTO are tight together, PGO is essentially part of LTO.
There seems to also be an issue with object libraries and PGO. add_pgo only adds the LTO options to the final targets, but on Unix, it needs to be specified at the compilation time too. So any code that we compile as object library and then add to e.g. libcoreclr.so would not have PGO enabled. It seems our PGO optimization is broken as many parts of the coreclr are now object libraries.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should enable LTO by default for everything (there is a cmake global property to do that) and change the add_pgo to only add the option to specify the profile file.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enable LTO by default for everything

We cannot enable LTO for the object files and static libraries that we ship. LTO is tied to specific compiler version that is not guaranteed to be on user's machine.

BTW: We do not collect the PGO data for most of libcoreclr.so currently - #128412 is working towards fixing that.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we may still be able to compile with -flto and get functional shipping static libs, using fat LTO. The benefit would be that for object files that go both to shipping static libs and to our shared libs, we could take advantage of LTO for the shared libs case. See https://llvm.org/docs/FatLTO.html. Supported in LLVM 17 and newer. But that's for a separate experiment.

@jkotas

jkotas commented Jun 11, 2026

Copy link
Copy Markdown
Member

I was finally able to verify that it doesn't degrade performance on Linux

Have you verified impact on NativeAOT too?

@janvorli

Copy link
Copy Markdown
Member Author

Have you verified impact on NativeAOT too?

I don't know how to do that. The tool we have for testing and comparing GC perf is the GC perf infrastructure tool in dotnet/performance which uses GCPerfSim, selected microbenchmarks and two selected ASPNet benchmarks. It doesn't support NativeAOT.

@jkotas

jkotas commented Jun 12, 2026

Copy link
Copy Markdown
Member

I don't know how to do that.

Run a program that spends a lot of time in the GC. Here is an example https://gist.github.com/jkotas/dca7f72bcac821af48f387dfebbc63ba . The GC measurements are inherently noisy, so wait for at least 20 iterations until the number starts to stabilize.

On current main, linux-x64 native aot binary size is 1242304 and I see the throughput stabilizing at ~4200ms. With this change (with the exact same main merged), native aot binary size is 1225920 and the throughput stabilizing at ~4300ms.

So there is 16kB less code and 2.4% throughput loss. This program spends about 20% in the GC, so the GC itself is ~12% slower.

Can you try to reproduce these results?

@jkotas

jkotas commented Jun 12, 2026

Copy link
Copy Markdown
Member

To publish a project with private build of native aot toolchain:

  • Build the repo using ./build.sh -c release /p:_BuildNativeAOTRuntimePack=true
  • Add nuget.config to the app that points to your repo build
<configuration>
  <packageSources>
    <add key="my" value="/home/jkotas/runtime/artifacts/packages/Release/Shipping" />
  </packageSources>
</configuration>
  • Add this to the app project file:
  <ItemGroup>
    <PackageReference Include="Microsoft.DotNet.ILCompiler" Version="11.0.0-dev" />
    <KnownRuntimePack Update="@(KnownRuntimePack)">
      <LatestRuntimeFrameworkVersion Condition="'%(KnownRuntimePack.Identity)' == 'Microsoft.NETCore.App'
      and '%(KnownRuntimePack.RuntimePackLabels)' == 'NativeAOT'
      and '%(KnownRuntimePack.TargetFramework)' == 'net11.0'">11.0.0-dev</LatestRuntimeFrameworkVersion>
    </KnownRuntimePack>
    <KnownILCompilerPack Update="@(KnownILCompilerPack)">
      <ILCompilerPackVersion Condition="'%(KnownILCompilerPack.TargetFramework)' == 'net11.0'">11.0.0-dev</ILCompilerPa>    </KnownILCompilerPack>
  </ItemGroup>
  • Publish the app using dotnet publish --packages pkg. --packages pkg avoids polluting your global nuget cache with your private build.

@am11

am11 commented Jun 12, 2026

Copy link
Copy Markdown
Member

To publish a project with private build of native aot toolchain:

Documented here https://github.com/dotnet/runtime/blob/main/docs/workflow/building/coreclr/nativeaot.md#building-packages.

@jkotas

jkotas commented Jun 12, 2026

Copy link
Copy Markdown
Member

Documented here https://github.com/dotnet/runtime/blob/main/docs/workflow/building/coreclr/nativeaot.md#building-packages

These steps are broken in multiple ways - tracked by #119166 .

@janvorli

Copy link
Copy Markdown
Member Author

@jkotas the build ./build.sh -c release /p:_BuildNativeAOTRuntimePack=true keeps consistently failing for me with some weird cpio error, even after git clean -xdf:

  Microsoft.NETCore.App.Ref ->
  Microsoft.NETCore.App.Ref -> /home/janvorli/git/runtime/artifacts/packages/Release/Shipping/dotnet-targeting-pack-11.0.0-dev-x64.deb
  Microsoft.NETCore.App.Ref -> /home/janvorli/git/runtime/artifacts/packages/Release/Shipping/dotnet-targeting-pack-11.0.0-dev-newkey-x64.deb
  /usr/bin/sh: 2: /tmp/MSBuildTemp8RxOst/tmp0de42a70dfe64c8d9106b06a9759e16b.exec.cmd: cpio: Permission denied

This happens for me when building just main without any changes as well as the current PR.

@jkotas

jkotas commented Jun 12, 2026

Copy link
Copy Markdown
Member

@janvorli

Copy link
Copy Markdown
Member Author

Do you have cpio prereq?

Ah, I didn't have it on that machine, that's probably the culprit

@janvorli

Copy link
Copy Markdown
Member Author

@jkotas on my machine, the diff is 0,7% and that still might be just noise given the variability of the non-averaged results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants