Fix LoaderHeap's free list growing more than expected#129203
Fix LoaderHeap's free list growing more than expected#129203eduardo-vp wants to merge 6 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR restructures UnlockedLoaderHeap’s free list to reduce allocation-time overhead in backout-heavy scenarios by replacing a single linear-scanned free list with size-segregated buckets for common small block sizes plus an overflow list for larger blocks.
Changes:
- Replaces
m_pFirstFreeBlockwith 32 size buckets (pointer-size increments) and a separate “large/overflow” free list. - Updates free-block insertion/allocation logic to use bucketed O(1) reuse for small sizes, and retains linear scanning only for the overflow list (including a stress-log warning on long scans).
- Adjusts debug-only free-list dumping and validation to iterate across all buckets and the overflow list.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/coreclr/utilcode/loaderheap.cpp | Implements bucket initialization, bucket-aware allocation/insertion, overflow scan warning, and updates debug dump/validation to traverse buckets. |
| src/coreclr/utilcode/loaderheap_shared.h | Updates LoaderHeapFreeBlock API to no longer take an explicit head pointer (heap chooses bucket internally). |
| src/coreclr/inc/loaderheap.h | Adds bucket/overflow free list fields and related constants to UnlockedLoaderHeap. |
Where is this race condition exactly? Can we add a lock there instead? The freelist in LoaderHeap is meant to be only used in error conditions to backout types that failed to load, or to deal with rare race condition. If you see the freelist growing this much, it means that the loader heap is not used correctly. We should fix that instead. |
|
I'll take a look, this the part where many threads lose runtime/src/coreclr/vm/genmeth.cpp Lines 493 to 542 in 24547a7 |
This reverts commit 96f0039.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
| pNewMD = (InstantiatedMethodDesc*) (CreateMethodDesc(pAllocator, | ||
| pExactMDLoaderModule, | ||
| pExactMT, | ||
| pGenericMDescInRepMT, | ||
| mcInstantiated, | ||
| !pWrappedMD, // This is pesimistic estimate for fNativeCodeSlot | ||
| &amt)); |
| pNewMD->SetTemporaryEntryPoint(&amt); | ||
|
|
||
| if (pOldMD == NULL) | ||
| { | ||
| // No one else got there first, our MethodDesc wins. | ||
| amt.SuppressRelease(); | ||
| amt.SuppressRelease(); | ||
|
|
| // Hold the lock across lookup and creation so that only one thread allocates | ||
| // the MethodDesc for a given instantiation. | ||
| CrstHolder ch(&pExactMDLoaderModule->m_InstMethodHashTableCrst); | ||
|
|
I was working with the test in https://github.com/korchak-aleksandr/net10-regression-repro and found out that the free list in UnlockedLoaderHeap grows to thousands of elements, which makes allocations very slow since we do a linear scan of this free list for each one of them.
In these scenarios multiple threads might need the same generic instantiation simultaneously and they all race to create/publish it. Multiple threads can lose the race and quickly add blocks to the free list since they don't need that memory. Subsequent calls that need generic instantiations do a linear scan of the free list to find a memory block to reuse. This ends up taking a lot of time due to its size (can be up to ten of thousands).
This PR stops making thread race to create/publish such that we don't insert several blocks in the free list.