Skip to content

Commit ab3cbf5

Browse files
max-charlambCopilotMax CharlambMax Charlamb
authored
[cDAC] Add interpreter support for stack walking and diagnostics (#126520)
## Summary Adds interpreter support to the cDAC (contract-based Data Access Component), enabling diagnostic tools to correctly walk stacks containing interpreter frames, resolve interpreter precodes, retrieve method information for interpreted methods, and surface interpreter code via the legacy SOS DAC interface. ## Changes ### Native Data Descriptors - Added interpreter type descriptors to `datadescriptor.inc`: `InterpreterRealCodeHeader`, `InterpreterPrecodeData`, `InterpByteCodeStart`, `InterpMethod`, `InterpMethodContextFrame`, `InterpreterFrame` - Added `InterpreterFrame` to `frames.h` explicit frame list for cDAC visibility ### Execution Manager — Interpreter JIT Manager - New `ExecutionManagerCore.InterpreterJitManager` handles code address lookups for interpreter code heaps - `GetCodeBlockHandle` now searches interpreter code heaps when JIT heaps don't contain the address - `GetMethodDesc` resolves `MethodDesc` from interpreter code headers ### Precode Resolution (`GetInterpreterCodeFromInterpreterPrecodeIfPresent`) - New API on `IPrecodeStubs` matching the native DAC pattern: each call site resolves interpreter precodes before passing addresses to `ExecutionManager` - Passthrough semantics: returns the original address if not an interpreter precode - NOTHROW contract via `VirtualReadException` catch - Applied at 4 call sites: `GetMethodDescData`, `CopyNativeCodeVersionToReJitData`, `GetTieredVersions`, `GetILAddressMap` ### Stack Walking - `FrameIterator` handles `InterpreterFrame` — extracts `MethodDesc` and native code pointer from `InterpMethodContextFrame` - `StackWalk_1` resolves interpreter frames during enumeration and uses `InterpreterVirtualUnwind` instead of OS unwind when the current IP is interpreter code - Per-arch `BaseFrameHandler` / `AMD64FrameHandler` / `ARM64FrameHandler` set the first-arg register to the `InterpreterFrame` when crossing an active `InlinedCallFrame`, matching the native runtime - Cross-platform interpreter context mirroring (`AMD64Context` / `ARM64Context` / `ARMContext`) tracks the latest interpreter frame pointer - Adds `ARM/ARMUnwinder.cs` and `ARM64/ARM64Unwinder.cs` so the cDAC stack walker doesn't crash when an interpreter IP has no native unwind info — outer failures `return false` (matching the native `OOPStackUnwinder*` convention) without clobbering `Pc` ### Legacy SOS DAC Interface (`SOSDacImpl`) - `GetCodeHeaderData` now returns `CodeHeaderData` for interpreter methods, populating `MethodDescPtr` and routing the GC-info decode by code kind - `GetILOffsetsByAddress` succeeds for interpreter IPs by resolving through the interpreter code header ### RuntimeTypeSystem - `MethodValidation` updated to handle interpreter method descriptors (`IsInterpreterStub` flag, chunk validation) ### Documentation - Updated `ExecutionManager.md`, `PrecodeStubs.md`, `StackWalk.md` with interpreter support details ### CI - New interpreter SOS leg in `eng/pipelines/runtime-diagnostics.yml` exercises the new contracts under the diagnostics SOS test suite ### Tests - **Unit tests**: `ExecutionManagerTests` (interpreter JIT manager), `FrameIteratorTests` (interpreter frame handling), `PrecodeStubsTests` (interpreter precode resolution), `MethodDescTests` (interpreter method validation), `SOSDacInterface5Tests` (interpreter precode resolution via legacy interface) - **Dump tests**: `InterpreterStackDumpTests` integration tests use a mixed JIT/interpreter stack debuggee (`InterpreterStack` + `Trampoline`) to validate interleaved frame layout, precode resolution, and thread enumeration; `ISOSDacInterfaceTests` validates `GetCodeHeaderData` for interpreter methods over a real dump --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Max Charlamb <maxcharlamb@github.com> Co-authored-by: Max Charlamb <maxcharlamb@microsoft.com>
1 parent d7a10f4 commit ab3cbf5

64 files changed

Lines changed: 2206 additions & 389 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docs/design/datacontracts/ExecutionManager.md

Lines changed: 22 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -138,7 +138,8 @@ public enum CodeKind : uint
138138
CallCountingStub = 9,
139139
MethodCallThunk = 10,
140140
Jitted = 11,
141-
ReadyToRun = 12
141+
ReadyToRun = 12,
142+
Interpreter = 13
142143
}
143144
```
144145

@@ -184,6 +185,10 @@ Data descriptors used:
184185
| `RealCodeHeader` | `DebugInfo` | Pointer to the DebugInfo |
185186
| `RealCodeHeader` | `GCInfo` | Pointer to the GCInfo encoding |
186187
| `RealCodeHeader` | `EHInfo` | Pointer to the `EE_ILEXCEPTION` containing exception clauses |
188+
| `InterpreterRealCodeHeader` | `MethodDesc` | Pointer to the corresponding `MethodDesc` for interpreter code |
189+
| `InterpreterRealCodeHeader` | `DebugInfo` | Pointer to the DebugInfo for interpreter code |
190+
| `InterpreterRealCodeHeader` | `GCInfo` | Pointer to the GCInfo encoding for interpreter code |
191+
| `InterpreterRealCodeHeader` | `JitEHInfo` | Pointer to the `EE_ILEXCEPTION` containing exception clauses for interpreter code |
187192
| `Module` | `ReadyToRunInfo` | Pointer to the `ReadyToRunInfo` for the module |
188193
| `ReadyToRunInfo` | `ReadyToRunHeader` | Pointer to the ReadyToRunHeader |
189194
| `ReadyToRunInfo` | `CompositeInfo` | Pointer to composite R2R info - or itself for non-composite |
@@ -282,9 +287,11 @@ The bulk of the work is done by the `GetCodeBlockHandle` API that maps a code po
282287
}
283288
```
284289

285-
There are two JIT managers: the "EE JitManager" for jitted code and "R2R JitManager" for ReadyToRun code.
290+
There are three JIT managers: the "EE JitManager" for jitted code, the "Interpreter JitManager" for interpreted code, and the "R2R JitManager" for ReadyToRun code.
286291

287-
The EE JitManager `GetMethodInfo` implements the nibble map lookup, summarized below, followed by returning the `RealCodeHeader` data:
292+
The EE JitManager and Interpreter JitManager both use the same nibble map lookup to find method code.
293+
The only difference is which code header type is read: the EE JitManager reads a `RealCodeHeader` while the Interpreter JitManager reads an `InterpreterRealCodeHeader`.
294+
Their shared `GetMethodInfo` is summarized below:
288295

289296
```csharp
290297
bool GetMethodInfo(TargetPointer rangeSection, TargetCodePointer jittedCodeAddress, [NotNullWhen(true)] out CodeBlock? info)
@@ -303,8 +310,10 @@ bool GetMethodInfo(TargetPointer rangeSection, TargetCodePointer jittedCodeAddre
303310
return false;
304311

305312
TargetPointer codeHeaderAddress = Target.ReadPointer(codeHeaderIndirect);
306-
TargetPointer methodDesc = Target.ReadPointer(codeHeaderAddress + /* RealCodeHeader::MethodDesc offset */);
307-
info = new CodeBlock(jittedCodeAddress, realCodeHeader.MethodDesc, relativeOffset);
313+
// EE JitManager: read RealCodeHeader at codeHeaderAddress
314+
// Interpreter JitManager: read InterpreterRealCodeHeader at codeHeaderAddress
315+
TargetPointer methodDesc = // read MethodDesc field from the appropriate code header
316+
info = new CodeBlock(jittedCodeAddress, methodDesc, relativeOffset);
308317
return true;
309318
}
310319
```
@@ -480,6 +489,8 @@ The `GetMethodDesc`, `GetStartAddress`, and `GetRelativeOffset` APIs extract fie
480489

481490
* For R2R code (`ReadyToRunJitManager`), a list of sorted `RUNTIME_FUNCTION` are stored on the module's `ReadyToRunInfo`. This is accessed as described above for `GetMethodInfo`. Again, the relevant `RUNTIME_FUNCTION` is found by binary searching the list based on IP.
482491

492+
* For interpreted code (`InterpreterJitManager`), there is no native unwind info. `GetUnwindInfo` returns null.
493+
483494
Unwind info (`RUNTIME_FUNCTION`) use relative addressing. For managed code, these values are relative to the start of the code's containing range in the RangeSectionMap (described below). This could be the beginning of a `CodeHeap` for jitted code or the base address of the loaded image for ReadyToRun code.
484495
`GetUnwindInfoBaseAddress` finds this base address for a given `CodeBlockHandle`.
485496

@@ -490,6 +501,8 @@ Unwind info (`RUNTIME_FUNCTION`) use relative addressing. For managed code, thes
490501
* For R2R code (`ReadyToRunJitManager`) the `DebugInfo` is stored as part of the R2R image. The relevant `ReadyToRunInfo` stores a pointer to the an `ImageDataDirectory` representing the `DebugInfo` directory. Read the `VirtualAddress` of this data directory as a `NativeArray` containing the `DebugInfos`. To find the specific `DebugInfo`, index into the array using the `index` of the beginning of the R2R function as found like in `GetMethodInfo` above. This yields an offset `offset` value relative to the image base. Read the first variable length uint at `imageBase + offset`, `lookBack`. If `lookBack != 0`, return `imageBase + offset - lookback`. Otherwise return `offset + size of reading lookback`.
491502
For R2R images, `hasFlagByte` is always `false`.
492503

504+
* For interpreted code (`InterpreterJitManager`), a pointer to the `DebugInfo` is stored on the `InterpreterRealCodeHeader` which is accessed in the same way as the EE JitManager's `GetMethodInfo` (nibble map lookup followed by code header read). `hasFlagByte` is always `false`.
505+
493506
`IExecutionManager.GetGCInfo` gets a pointer to the relevant GCInfo for a `CodeBlockHandle`. The ExecutionManager delegates to the JitManager implementations as the GCInfo is stored differently on jitted and R2R code.
494507

495508
* For jitted code (`EEJitManager`) a pointer to the `GCInfo` is stored on the `RealCodeHeader` which is accessed in the same way as `GetMethodInfo` described above. This can simply be returned as is. The `GCInfoVersion` is defined by the runtime global `GCInfoVersion`.
@@ -498,6 +511,8 @@ For R2R images, `hasFlagByte` is always `false`.
498511
* The `GCInfoVersion` of R2R code is mapped from the R2R MajorVersion and MinorVersion which is read from the ReadyToRunHeader which itself is read from the ReadyToRunInfo (can be found as in GetMethodInfo). The current GCInfoVersion mapping is:
499512
* MajorVersion >= 11 and MajorVersion < 15 => 4
500513

514+
* For interpreted code (`InterpreterJitManager`), a pointer to the `GCInfo` is stored on the `InterpreterRealCodeHeader`, accessed via nibble map lookup as with the EE JitManager. The `GCInfoVersion` is defined by the runtime global `GCInfoVersion`. The GC info is decoded using interpreter-specific decoding (`DecodeInterpreterGCInfo`).
515+
501516

502517
`IExecutionManager.GetFuncletStartAddress` finds the start of the code blocks funclet. This will be different than the methods start address `GetStartAddress` if the current code block is inside of a funclet. To find the funclet start address, we get the unwind info corresponding to the code block using `IExecutionManager.GetUnwindInfo`. We then parse the unwind info to find the begin address (relative to the unwind info base address) and return the unwind info base address + unwind info begin address.
503518

@@ -511,11 +526,11 @@ There are two distinct clause data types. JIT-compiled code uses `EEExceptionCla
511526

512527
* For R2R code (`ReadyToRunJitManager`), exception clause data is found via the `ExceptionInfo` section (section type 104) of the R2R image. The section is located by traversing `ReadyToRunInfo::Composite` to reach the `ReadyToRunCoreInfo`, then reading its `Header` pointer to the `ReadyToRunCoreHeader`, and iterating through the inline `ReadyToRunSection` array that immediately follows the header. The `ExceptionInfo` section contains an `ExceptionLookupTableEntry` array, where each entry maps a `MethodStartRVA` to an `ExceptionInfoRVA`. A binary search (falling back to linear scan for small ranges) finds the entry matching the method's RVA. The exception clauses span from that entry's `ExceptionInfoRVA` to the next entry's `ExceptionInfoRVA`, both offset from the image base. The clause array is strided using the size of `R2RExceptionClause`.
513528

514-
After obtaining the clause array bounds, the common iteration logic classifies each clause by its flags. The native `COR_ILEXCEPTION_CLAUSE` flags are bit flags: `Filter` (0x1), `Finally` (0x2), `Fault` (0x4). If none are set, the clause is `Typed`. For typed clauses, if the `CachedClass` flag (0x10000000) is set (JIT-only, used for dynamic methods), the union field contains a resolved `TypeHandle` pointer; the clause is a catch-all if this pointer equals the `ObjectMethodTable` global. Otherwise, the union field is a metadata `ClassToken`. To determine whether a typed clause is a catch-all handler, the `ClassToken` (which may be a `TypeDef` or `TypeRef`) is resolved to a `MethodTable` via the `Loader` contract's module lookup maps (`TypeDefToMethodTable` or `TypeRefToMethodTable`) and compared against the `ObjectMethodTable` global. For typed clauses without a cached type handle, the module address is resolved by walking `CodeBlockHandle` → `MethodDesc` → `MethodTable` → `TypeHandle` → `Module` via the `RuntimeTypeSystem` contract.
529+
After obtaining the clause array bounds, the common iteration logic classifies each clause by its flags. The native `COR_ILEXCEPTION_CLAUSE` flags are bit flags: `Filter` (0x1), `Finally` (0x2), `Fault` (0x4). If none are set, the clause is `Typed`. For typed clauses, if the `CachedClass` flag (0x10000000) is set (JIT-only, used for dynamic methods), the union field contains a resolved `TypeHandle` pointer; the clause is a catch-all if this pointer equals the `ObjectMethodTable` global. Otherwise, the union field is a metadata `ClassToken`. To determine whether a typed clause is a catch-all handler, the `ClassToken` (which may be a `TypeDef` or `TypeRef`) is resolved to a `MethodTable` via the `Loader` contract's module lookup maps (`TypeDefToMethodTable` or `TypeRefToMethodTable`) and compared against the `ObjectMethodTable` global. For typed clauses without a cached type handle, the module address is resolved by walking `CodeBlockHandle` -> `MethodDesc` -> `MethodTable` -> `TypeHandle` -> `Module` via the `RuntimeTypeSystem` contract.
515530

516531
`IsFilterFunclet` first checks `IsFunclet`. If the code block is a funclet, it retrieves the EH clauses for the method and checks whether any filter clause's handler offset matches the funclet's relative offset. If a match is found, the funclet is a filter funclet.
517532

518-
`GetCodeKind` classifies a code address by finding its owning range section and determining the code kind. It distinguishes between jitted code, stub code blocks (jump stubs, precode stubs, VSD stubs, etc.), and ReadyToRun code. Returns `Unknown` if the address cannot be classified. We depend on the values of the StubCodeBlockKind enum defined in codeman.h; for non-R2R code, we compare either the RangeList type or the code header against the values of this enum.
533+
`GetCodeKind` classifies a code address by finding its owning range section and determining the code kind. It distinguishes between jitted code, stub code blocks (jump stubs, precode stubs, VSD stubs, etc.), ReadyToRun code, and interpreter code. Returns `Unknown` if the address cannot be classified. We depend on the values of the StubCodeBlockKind enum defined in codeman.h; for non-R2R code, we compare either the RangeList type or the code header against the values of this enum.
519534
### FindReadyToRunModule
520535

521536
`FindReadyToRunModule` locates the ReadyToRun module whose PE image contains the given address. Unlike `GetCodeBlockHandle` (which only matches code regions), this API matches against the full PE image range - including data sections such as import tables. This is used in GCRefMap resolution as it requires finding the module that owns an import section indirection address, which is in the data section rather than the code section.

docs/design/datacontracts/PrecodeStubs.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,11 @@ This contract provides support for examining [precode](../coreclr/botr/method-de
1111
// Given an interior address within a precode stub and the kind of stub (StubPrecode or FixupPrecode),
1212
// computes the entry point of the precode.
1313
TargetPointer GetPrecodeEntryPointFromInteriorAddress(TargetCodePointer interiorAddress, bool isFixupPrecode);
14+
15+
// If the code pointer is an interpreter precode, returns the actual interpreter
16+
// code address (ByteCodeAddr). Otherwise returns the original address unchanged.
17+
// Mirrors GetInterpreterCodeFromInterpreterPrecodeIfPresent in native code (precode.cpp).
18+
TargetCodePointer GetInterpreterCodeFromInterpreterPrecodeIfPresent(TargetCodePointer entryPoint);
1419
```
1520

1621
## Version 1, 2, and 3
@@ -44,6 +49,10 @@ Data descriptors used:
4449
| StubPrecodeData | Type | precise sort of stub precode |
4550
| FixupPrecodeData | MethodDesc | pointer to the MethodDesc associated with this fixup precode |
4651
| ThisPtrRetBufPrecodeData | MethodDesc | pointer to the MethodDesc associated with the ThisPtrRetBufPrecode (Version 2 only) |
52+
| InterpreterPrecodeData | ByteCodeAddr | pointer to the `InterpByteCodeStart` for the interpreter bytecode (Version 3 only) |
53+
| InterpreterPrecodeData | Type | precode sort byte identifying this as an interpreter precode (Version 3 only) |
54+
| InterpByteCodeStart | Method | pointer to the `InterpMethod` associated with the bytecode |
55+
| InterpMethod | MethodDesc | pointer to the MethodDesc for the interpreted method |
4756

4857
arm32 note: the `CodePointerToInstrPointerMask` is used to convert IP values that may include an arm Thumb bit (for example extracted from disassembling a call instruction or from a snapshot of the registers) into an address. On other architectures applying the mask is a no-op.
4958

@@ -263,6 +272,22 @@ After the initial precode type is determined, for stub precodes a refined precod
263272
}
264273
}
265274

275+
// Version 3 only: resolves MethodDesc for interpreter precodes by following
276+
// the InterpreterPrecodeData -> InterpByteCodeStart -> InterpMethod -> MethodDesc chain.
277+
internal sealed class InterpreterPrecode : ValidPrecode
278+
{
279+
internal InterpreterPrecode(TargetPointer instrPointer) : base(instrPointer, KnownPrecodeType.Interpreter) { }
280+
281+
internal override TargetPointer GetMethodDesc(Target target, Data.PrecodeMachineDescriptor precodeMachineDescriptor)
282+
{
283+
TargetPointer dataAddr = InstrPointer + precodeMachineDescriptor.StubCodePageSize;
284+
Data.InterpreterPrecodeData precodeData = target.ProcessedData.GetOrAdd<Data.InterpreterPrecodeData>(dataAddr);
285+
Data.InterpByteCodeStart byteCodeStart = target.ProcessedData.GetOrAdd<Data.InterpByteCodeStart>(precodeData.ByteCodeAddr);
286+
Data.InterpMethod interpMethod = target.ProcessedData.GetOrAdd<Data.InterpMethod>(byteCodeStart.Method);
287+
return interpMethod.MethodDesc;
288+
}
289+
}
290+
266291
internal TargetPointer CodePointerReadableInstrPointer(TargetCodePointer codePointer)
267292
{
268293
// Mask off the thumb bit, if we're on arm32, to get the actual instruction pointer
@@ -286,6 +311,8 @@ After the initial precode type is determined, for stub precodes a refined precod
286311
return new PInvokeImportPrecode(instrPointer);
287312
case KnownPrecodeType.ThisPtrRetBuf:
288313
return new ThisPtrRetBufPrecode(instrPointer);
314+
case KnownPrecodeType.Interpreter:
315+
return new InterpreterPrecode(instrPointer);
289316
default:
290317
break;
291318
}
@@ -299,6 +326,33 @@ After the initial precode type is determined, for stub precodes a refined precod
299326

300327
return precode.GetMethodDesc(_target, MachineDescriptor);
301328
}
329+
330+
// Returns the interpreter bytecode address if the entry point is an interpreter precode,
331+
// otherwise returns the original entry point unchanged.
332+
// This method never throws - on any failure, the original address is returned.
333+
TargetCodePointer IPrecodeStubs.GetInterpreterCodeFromInterpreterPrecodeIfPresent(TargetCodePointer entryPoint)
334+
{
335+
try
336+
{
337+
TargetPointer instrPointer = CodePointerReadableInstrPointer(entryPoint);
338+
if (!IsAlignedInstrPointer(instrPointer))
339+
return entryPoint;
340+
341+
if (TryGetKnownPrecodeType(instrPointer) is not KnownPrecodeType.Interpreter)
342+
return entryPoint;
343+
344+
TargetPointer dataAddr = instrPointer + MachineDescriptor.StubCodePageSize;
345+
Data.InterpreterPrecodeData precodeData = // read InterpreterPrecodeData at dataAddr
346+
if (precodeData.ByteCodeAddr == TargetPointer.Null)
347+
return entryPoint;
348+
349+
return new TargetCodePointer(precodeData.ByteCodeAddr);
350+
}
351+
catch
352+
{
353+
return entryPoint;
354+
}
355+
}
302356
```
303357

304358
### `GetPrecodeEntryPointFromInteriorAddress`

0 commit comments

Comments
 (0)