[cDAC] Add interpreter support for stack walking and diagnostics#126520
Conversation
7e1ec2a to
cdaefe0
Compare
d425631 to
d5c6cd8
Compare
c19d421 to
d600dfb
Compare
d600dfb to
57325e4
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 63 out of 64 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (3)
src/native/managed/cdac/tests/DumpTests/ISOSDacInterfaceTests.cs:1
ConditionalTheoryis used but this file doesn’t importMicrosoft.DotNet.XUnitExtensions, so it will fail to compile unless there’s a global using elsewhere. Addusing Microsoft.DotNet.XUnitExtensions;(and optionally simplifySkipTestExceptionusage) to make the dependency explicit.
src/native/managed/cdac/tests/DumpTests/DumpTests.targets:1_DebuggeeEnvVarsis already escaped earlier, but it is escaped again when passed through nestedMSBuildcalls. This double-escaping prevents semicolon-separated env var lists (e.g.A=B;C=D) from being restored correctly when later unescaped forExec EnvironmentVariables(you’d end up with%3Binstead of;). Pass the value through without re-escaping (or unescape before re-escaping) so multi-variable lists remain semicolon-delimited at the finalExec.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/Context/ARM64/ARM64Unwinder.cs:1- Similar to the ARM change, returning
falseon missing unwind info risks leaving the context unchanged if the caller doesn’t propagate this failure (most of the stack walk logic expects unwind errors to surface as exceptions). To avoid stalls or partial walks, either throw anInvalidOperationExceptionhere (as before, or consistent with other unwinders) or update the unwind call sites to handle afalsereturn deterministically.
// Licensed to the .NET Foundation under one or more agreements.
Add cDAC contracts and implementations to walk interpreter frames during stack walks and identify interpreter-managed methods. The cDAC stack walker now mirrors the native DAC behavior: * Yields `InterpreterFrame` as a runtime frame marker (pMD = NULL), matching the native DAC. * Implements interpreter virtual unwind by following the `InterpMethodContextFrame.pParent` chain so each interpreted method is yielded as a frameless frame. * Resolves the top `InterpMethodContextFrame` from the `InterpreterFrame` in the same way as the native runtime (`GetTopInterpMethodContextFrame`). * Adds `InterpreterJitManager` for interpreter `CodeBlockHandle` resolution and a precode-stubs fallthrough that recognizes `InterpreterPrecode`. Includes data descriptors for `InterpreterFrame`, `InterpMethodContextFrame`, `InterpMethod`, `InterpByteCodeStart`, `InterpreterPrecodeData`, and `InterpreterRealCodeHeader`, plus documentation updates and unit tests covering the new contract behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add dump-test infrastructure for the interpreter scenarios and tests that exercise the cDAC interpreter stack walker against real coredumps: * `InterpreterStack` debuggee: single-threaded debuggee with a JIT->interpreter->JIT->interpreter call chain that triggers a `FailFast` from inside an interpreted method. * `InterpreterStackDoubleWalk` debuggee: multi-threaded debuggee where a worker thread is parked deep in an interpreted call chain while the main thread captures the dump. This exercises walking a thread other than the crashing thread and asserts the cDAC walker does not produce duplicated `InterpreterFrame` markers. The new dump tests verify the interleaved JIT/interpreter frame layout, the absence of doubled `InterpreterFrame` markers, and that `DumpTestStackWalker` adjacency assertions hold across the full interpreter call chain. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When CheckForSkippedFrames clones the context for an interpreter IP and calls Unwind, the interpreter's GetUnwindInfo returns TargetPointer.Null. AMD64Unwinder already guards against this by checking for null and returning false; ARM and ARM64 unwinders did not, and crashed reading RuntimeFunction at address 0 (VirtualReadException at 0x00000000). Add the same null guard to ARM and ARM64 to make all three platforms behaviorally consistent. Also remove StackWalk_NoDoubledInterpreterFrames_WithDebuggerFilterContext since it cannot run in CI (depends on a manually-collected cdb dump in the local repo) and the scenario is invalid per native PR dotnet#126953. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Restore single ternary at StackWalk_1.cs (was needlessly verbose) - Remove unrelated infinite-loop guard around Unwind - Replace unicode arrows/em-dashes with -> and -- in code/comments/docs - Drop direct line-number references in comments - Make FrameHelpers access modifiers consistent: public for externally callable methods, private for helpers Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add SetContextToInterpMethodContextFrame and VirtualUnwindInterpreterCallFrame helpers in FrameHelpers, named to mirror native (frames.cpp / eetwain.cpp) - Add IsFaulting field on InterpreterFrame and Stack field on InterpMethodContextFrame; thread Data.InterpreterFrame.Address through - Add RawContextFlags abstraction across all platform contexts so the helper can OR in CONTEXT_EXCEPTION_ACTIVE for faulting top frames - Gate IExecutionManager.IsFunclet on JitType.Interpreter so interpreter code (no native unwind info) reports false without throwing in GetFuncletStartAddress - Remove flaky InterpreterStackDoubleWalk debuggee + tests (timing-sensitive SpinStep loop) - Address Copilot bot feedback: MSBuild escaping in DumpTests.targets, funclet null-safety for interpreter Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…pe; hide GetFrameHandler behind helper - SOSDacImpl.GetCodeHeaderData: when JIT type is Interpreter, decode the GC info using DecodeInterpreterGCInfo instead of DecodePlatformSpecificGCInfo. This mirrors native ClrDataAccess::GetCodeHeaderData routing through EECodeInfo::GetCodeManager()->GetFunctionSize, where interpreter code goes through InterpreterCodeManager::GetFunctionSize / InterpreterGcInfoDecoder. - FrameHelpers: make GetFrameHandler private and add a semantic helper ApplyInterpreterFrameTransition(context, interpreterFrameAddress) that encapsulates reading the InterpreterFrame as a FramedMethodFrame and invoking HandleTransitionFrame on it. Update StackWalk_1.InterpreterVirtualUnwind to use the new helper instead of reaching into the frame-handler dispatch directly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds an opt-in Interpreter SOS test leg to runtime-diagnostics.yml that
exercises the diagnostics-side SOSInterpreterTests against a Checked CoreCLR
drop. FEATURE_INTERPRETER is only compiled into Debug/Checked CoreCLR, so the
existing Release build leg cannot run this test.
Changes:
* eng/pipelines/diagnostics/runtime-diag-job.yml: adds a testInterpreter
parameter (default false). When true, _TestInterpreterArgs resolves to
-testInterpreter and is forwarded to the diagnostics build script alongside
the existing -useCdac / -noFallback flags. Default-false invocation is
identical to today.
* eng/pipelines/runtime-diagnostics.yml:
- New build leg that produces a Checked CoreCLR (libs/SDK stay Release) under
a distinct artifact name (..._coreclr_Checked).
- New Interpreter test job that depends on the Checked build leg, downloads
its artifact, and is the only job that sets testInterpreter: true.
- The cDAC / cDAC_no_fallback / DAC test jobs and the existing Release build
leg are unchanged.
Tested with a manual ADO queue using diagnosticsBranch pointed at the
companion diagnostics PR (dotnet/diagnostics#5829).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…CallFrame handlers Mirror the per-architecture native InlinedCallFrame::UpdateRegDisplay_Impl logic in the corresponding cDAC frame handlers: - AMD64FrameHandler.HandleInlinedCallFrame: sets Rcx (Windows) or Rdi (Unix SysV) to the InterpreterFrame address when the next Frame is an InterpreterFrame, matching src/coreclr/vm/amd64/cgenamd64.cpp:212-218. - ARM64FrameHandler.HandleInlinedCallFrame: sets X0 to the InterpreterFrame address when the next Frame is an InterpreterFrame, matching src/coreclr/vm/arm64/stubs.cpp:408-414. ARM, x86, LoongArch64, and RISCV64 native InlinedCallFrame::UpdateRegDisplay_Impl do NOT perform this update, so the corresponding cDAC handlers (or BaseFrameHandler inheritance) do not either. A protected GetNextFrame helper is added to BaseFrameHandler so each handler can inspect the chain itself without changing the IPlatformFrameHandler interface (handles the FRAME_TOP all-ones terminator). BaseFrameHandler also constructs its own FrameHelpers from the target so derived handlers can inline calls to GetFrameType. Without this update, cDAC reports the thread's literal saved Rcx for frames between an InlinedCallFrame and its successor InterpreterFrame, while the legacy DAC reports the InterpreterFrame address. This trips Debug.Assert(contextStruct.Equals(localContextStruct)) in ClrDataStackWalk.GetContext during !ClrStack on a thread captured during a P/Invoke from interpreted code. Verified locally against the SOS.InterpreterStackTest.Heap.dmp captured in CI: !ClrStack and !PrintException both succeed with the cDAC parity check enabled, walking through [InlinedCallFrame], JIT IL stub frames, [InterpreterFrame: ...], and the interpreted method frames in correct order. All 2081 cDAC unit tests continue to pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The cDAC's InterpreterJitManager (added in this PR) registers interpreter code as code blocks with an interpreter-format DebugInfo, and DebugInfo_2.GetMethodNativeMap successfully decodes that DebugInfo into an OffsetMapping list. This means cDAC produces real, correct IL offsets for interpreter pseudo-IPs (e.g. for !ClrStack -lines source-line resolution), where the legacy DAC bails out early with E_INVALIDARG because ExecutionManager::GetNativeCodeVersion(address) returns null for interpreter IPs (daccess.cpp:5660-5666). Switch the parity validation in GetILOffsetsByAddress from the default AllowDivergentFailures (which rejects cDAC-success vs DAC-failure) to AllowCdacSuccess so cDAC's better behavior is permitted. Gate the offset-comparison block on hrLocal == S_OK as well, so we don't spuriously compare against a zero-initialized localOffsetsNeeded / localIlOffsets when only cDAC succeeded. Verified locally against the SOS.InterpreterStackTest.Heap.dmp from CI: !ClrStack -lines now resolves source lines for the JIT-compiled EH frames AND for the interpreter frames (InterpTestMethodThrow @ 19, InterpTestMethodRunNested @ 13, InterpreterStackTestApp.Main @ 12), with the cDAC parity check enabled. All 2081 cDAC unit tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds ISOSDacInterfaceTests dump test class mirroring ISOSDacInterface13Tests, with a GetCodeHeaderData test that validates TYPE_INTERPRETER routing through the interpreter precode unwrap path. Also switches the InterpreterStack debuggee from a full dump to a heap dump (107 MB -> 16 MB) since the heap dump captures all memory needed for interpreter stack walking and code-header lookup. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Linux createdump segfaults during heap-dump region enumeration when the interpreter is active, so InterpreterStack switches to full dumps. Tracked by dotnet#128044. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…-bit targets Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Keep the original catch (VirtualReadException) clause; the NotImplementedException widening will be handled separately. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Missed during the rebase onto main (which removed JitType in favor of the unified CodeKind enum). Updates AssertInterpreted/AssertJitted to call IExecutionManager.GetCodeKind with the resolved code pointer. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- MockDescriptors.ExecutionManager.cs: Remove AddRangeListSection helper that created an invalid RangeSection (RangeList flag set but RangeList and JitManager pointers null). Switch the one caller in GetCodeBlockHandle_InterpreterPrecode_ReturnsNull to use the AddRangeListRangeSection helper which properly initializes RangeList and JitManager. - ExecutionManager.md: Fix missing space after backticked method name in the NonVirtualEntry2MethodDesc paragraph. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| └-> ParentPtr -> null | ||
| ``` | ||
|
|
||
| This produces three frames in order: C, B, A (innermost to outermost). |
There was a problem hiding this comment.
I think there are a few important details that are assumed by the implementation but not covered in the docs:
- Using the native CONTEXT data structure to represent interpreter code frames. Thus far CONTEXT was used 1:1 with a physical stack frame but now we are populating it with synthetic data representing logical interpreter stack frames. There is conceptual part and also the specifics of which CONTEXT registers are used to store which elements of interpreter state.
- Special handling for InlinedCallFrame relative to interpreter frames.
- The anticipated enumeration ordering between InlinedCallFrame, InterpreterFrame, logical interpreted code frames, and other surrounding managed/native code frames that aren't interpreted.
I think its fine if you want to keep the current API shape right now to get things moving forward, but as we work more to support mscordbi's stackwalking and/or NativeAOT stackwalking I suspect it may be helpful to separate interpreter stackwalk and native machine stackwalking more than they currently are. For example we might have APIs approximately like this:
// A logical context for the interpreter that only stores a few fields needed to
// support unwinding interpreter frames and accessing locals
struct InterpreterContext
{
TargetPointer IP;
TargetPointer InterpMethodContextFrame;
TargetPointer InterpMethodContextFrameStack;
TargetPointer InterpreterFrame;
bool IsFaulting;
}
class InterpreterStackwalkContract
{
// If the native context has synthetic interpreter data in it, convert it to
// the simplified InterpreterContext
bool TryConvertMachineContextToInterpreterContext(
IPlatformAgnosticContext machineContext,
out InterpreterContext? interpreterContext);
// Enumerate interpreter frames within a single InterpreterFrame
IEnumerable<InterpreterContext> EnumerateInterpretedStackFrames(TargetPointer InterpreterFrame);
// Enumerate interpreter frames within a single InterpreterFrame starting from
// some specific initial context.
IEnumerable<InterpreterContext> EnumerateInterpretedStackFrames(InterpreterContext startContext);
}The basic stackwalk contract wouldn't need any special knowledge of interpreter I think, other than InterpreterFrame is one kind of Frame that might be observed. Contract consumers would be free to substitute InterpreterFrame for an enumeration of its nested frames if desired. For stackwalks where the seed CONTEXT is in interpreted code, the TryConvert API gives InterpreterContext.InterpreterFrame which can be used to recover the starting point for the base stackwalker.
There was a problem hiding this comment.
That seems reasonable. I'd like to get this in as is, then will look at follow-ups including this as well as more testing.
| { | ||
| if (_target.Contracts.RuntimeInfo.GetTargetOperatingSystem() == RuntimeInfoOperatingSystem.Windows) | ||
| { | ||
| _holder.Context.Rcx = next.Address.Value; |
There was a problem hiding this comment.
Nit: Is this an inlined implementation of FirstArgRegister?
It might be useful to have IPlatformAgnosticContext define methods to get/set this rather than doing platform specific overrides in every FrameHandler to specify a different register. It would also make it a little clearer what the underlying concept is guiding these register choices.
Summary
Adds interpreter support to the cDAC (contract-based Data Access Component), enabling diagnostic tools to correctly walk stacks containing interpreter frames, resolve interpreter precodes, retrieve method information for interpreted methods, and surface interpreter code via the legacy SOS DAC interface.
Changes
Native Data Descriptors
datadescriptor.inc:InterpreterRealCodeHeader,InterpreterPrecodeData,InterpByteCodeStart,InterpMethod,InterpMethodContextFrame,InterpreterFrameInterpreterFrametoframes.hexplicit frame list for cDAC visibilityExecution Manager — Interpreter JIT Manager
ExecutionManagerCore.InterpreterJitManagerhandles code address lookups for interpreter code heapsGetCodeBlockHandlenow searches interpreter code heaps when JIT heaps don't contain the addressGetMethodDescresolvesMethodDescfrom interpreter code headersPrecode Resolution (
GetInterpreterCodeFromInterpreterPrecodeIfPresent)IPrecodeStubsmatching the native DAC pattern: each call site resolves interpreter precodes before passing addresses toExecutionManagerVirtualReadExceptioncatchGetMethodDescData,CopyNativeCodeVersionToReJitData,GetTieredVersions,GetILAddressMapStack Walking
FrameIteratorhandlesInterpreterFrame— extractsMethodDescand native code pointer fromInterpMethodContextFrameStackWalk_1resolves interpreter frames during enumeration and usesInterpreterVirtualUnwindinstead of OS unwind when the current IP is interpreter codeBaseFrameHandler/AMD64FrameHandler/ARM64FrameHandlerset the first-arg register to theInterpreterFramewhen crossing an activeInlinedCallFrame, matching the native runtimeAMD64Context/ARM64Context/ARMContext) tracks the latest interpreter frame pointerARM/ARMUnwinder.csandARM64/ARM64Unwinder.csso the cDAC stack walker doesn't crash when an interpreter IP has no native unwind info — outer failuresreturn false(matching the nativeOOPStackUnwinder*convention) without clobberingPcLegacy SOS DAC Interface (
SOSDacImpl)GetCodeHeaderDatanow returnsCodeHeaderDatafor interpreter methods, populatingMethodDescPtrand routing the GC-info decode by code kindGetILOffsetsByAddresssucceeds for interpreter IPs by resolving through the interpreter code headerRuntimeTypeSystem
MethodValidationupdated to handle interpreter method descriptors (IsInterpreterStubflag, chunk validation)Documentation
ExecutionManager.md,PrecodeStubs.md,StackWalk.mdwith interpreter support detailsCI
eng/pipelines/runtime-diagnostics.ymlexercises the new contracts under the diagnostics SOS test suiteTests
ExecutionManagerTests(interpreter JIT manager),FrameIteratorTests(interpreter frame handling),PrecodeStubsTests(interpreter precode resolution),MethodDescTests(interpreter method validation),SOSDacInterface5Tests(interpreter precode resolution via legacy interface)InterpreterStackDumpTestsintegration tests use a mixed JIT/interpreter stack debuggee (InterpreterStack+Trampoline) to validate interleaved frame layout, precode resolution, and thread enumeration;ISOSDacInterfaceTestsvalidatesGetCodeHeaderDatafor interpreter methods over a real dump