You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Spinel can produce accurate, true-stack, Ruby-level flamegraphs from itself, reusing machinery that already shipped — without depending on perf (often locked down: perf_event_paranoid on hardened/CI hosts) or gprof (approximate: self-time is apportioned up caller paths, not sampled from real stacks, so deep call graphs get rootless/fuzzy frames).
The idea
A runtime sampling profiler: a SIGPROF (or timer-thread) handler that, on each tick, walks the current stack via the same sp_caller_now() / sp_bt_symbol path that backtrace (#1300) uses, and accumulates folded stacks (Class#method;Class#method;… count) — the universal flamegraph input. Enabled opt-in, e.g. spinel --profile app.rb or SPINEL_PROFILE=1 ./app, writing app.folded on exit.
Because frames come from the in-process symbolizer, they're already Ruby (no out-of-process demangle), the stacks are real (no gprof apportioning), and GC/alloc frames (sp_gc_*) self-identify — so the GC-vs-user split that's currently a frame-name heuristic in spinel-dev's tools/perf/spinel-flamegraph.rb becomes exact.
Why now
The substrate is done: #1300 gives stack capture + symbolization; #1335 (wire Kernel#caller) exercises the same path from Ruby; #1334 (symbol map) is the static counterpart. A sampler is "call sp_caller_now() on a timer and count." It turns the perf-analysis work on spinel-dev (#5/#7 — where we found a real Rails workload is allocation-bound at ~55–72% GC, which a flamegraph makes legible) from a gprof-bolt-on into a first-class spinel --profile.
Open design questions (the RFC part)
Trigger:SIGPROF + setitimer (simple, single-thread — fits the current cooperative-fiber model) vs a dedicated sampler thread. Async-signal-safety of sp_caller_now() under a handler needs checking (it allocates a sp_StrArray today; a sampler may want a preallocated ring buffer instead).
Fiber awareness: attribute samples to the active fiber's logical stack, not just the C stack.
Output: folded stacks (renderer-agnostic) as the primitive; SVG is a tool concern.
Adjacent asks this enables / pairs with
Allocation profiling hooks — per-type/per-site alloc counts would make the GC decomposition measured rather than inferred from sp_gc_* frame names; directly supports the perf: hoist infer_type(recv) once at top of infer_operator_type #7 tier question (and the Ractor-heap story, OriPekelman/spinel#1333).
DWARF variable-location fidelity under --debug — orthogonal, but the other place where the #line build trades away debug-info accuracy (heap locals misread by lldb today; the differential bisector works around it).
Not merge-ready — asking whether a native --profile is wanted, and which trigger/output shape fits the runtime. Happy to build the folded-stacks→flamegraph consumer (it already exists in spinel-dev). Full context: https://github.com/OriPekelman/spinel-dev (issues #5, #7).
RFC — request for comment (the bigger swing)
Spinel can produce accurate, true-stack, Ruby-level flamegraphs from itself, reusing machinery that already shipped — without depending on
perf(often locked down:perf_event_paranoidon hardened/CI hosts) orgprof(approximate: self-time is apportioned up caller paths, not sampled from real stacks, so deep call graphs get rootless/fuzzy frames).The idea
A runtime sampling profiler: a
SIGPROF(or timer-thread) handler that, on each tick, walks the current stack via the samesp_caller_now()/sp_bt_symbolpath that backtrace (#1300) uses, and accumulates folded stacks (Class#method;Class#method;… count) — the universal flamegraph input. Enabled opt-in, e.g.spinel --profile app.rborSPINEL_PROFILE=1 ./app, writingapp.foldedon exit.Because frames come from the in-process symbolizer, they're already Ruby (no out-of-process demangle), the stacks are real (no gprof apportioning), and GC/alloc frames (
sp_gc_*) self-identify — so the GC-vs-user split that's currently a frame-name heuristic in spinel-dev'stools/perf/spinel-flamegraph.rbbecomes exact.Why now
The substrate is done:
#1300gives stack capture + symbolization; #1335 (wireKernel#caller) exercises the same path from Ruby; #1334 (symbol map) is the static counterpart. A sampler is "callsp_caller_now()on a timer and count." It turns the perf-analysis work on spinel-dev (#5/#7 — where we found a real Rails workload is allocation-bound at ~55–72% GC, which a flamegraph makes legible) from a gprof-bolt-on into a first-classspinel --profile.Open design questions (the RFC part)
SIGPROF+setitimer(simple, single-thread — fits the current cooperative-fiber model) vs a dedicated sampler thread. Async-signal-safety ofsp_caller_now()under a handler needs checking (it allocates asp_StrArraytoday; a sampler may want a preallocated ring buffer instead).Adjacent asks this enables / pairs with
sp_gc_*frame names; directly supports the perf: hoist infer_type(recv) once at top of infer_operator_type #7 tier question (and the Ractor-heap story, OriPekelman/spinel#1333).--debug— orthogonal, but the other place where the#linebuild trades away debug-info accuracy (heap locals misread by lldb today; the differential bisector works around it).Not merge-ready — asking whether a native
--profileis wanted, and which trigger/output shape fits the runtime. Happy to build the folded-stacks→flamegraph consumer (it already exists in spinel-dev). Full context: https://github.com/OriPekelman/spinel-dev (issues #5, #7).