Skip to content

RFC: native sampling profiler — folded Ruby stacks via the backtrace machinery (no perf/gprof) #1336

@OriPekelman

Description

@OriPekelman

RFC — request for comment (the bigger swing)

Spinel can produce accurate, true-stack, Ruby-level flamegraphs from itself, reusing machinery that already shipped — without depending on perf (often locked down: perf_event_paranoid on hardened/CI hosts) or gprof (approximate: self-time is apportioned up caller paths, not sampled from real stacks, so deep call graphs get rootless/fuzzy frames).

The idea

A runtime sampling profiler: a SIGPROF (or timer-thread) handler that, on each tick, walks the current stack via the same sp_caller_now() / sp_bt_symbol path that backtrace (#1300) uses, and accumulates folded stacks (Class#method;Class#method;… count) — the universal flamegraph input. Enabled opt-in, e.g. spinel --profile app.rb or SPINEL_PROFILE=1 ./app, writing app.folded on exit.

main;Tep::Request#new;Tep.str_hash;StrStrHash_new;sp_gc_alloc   1899
main;ActiveRecord::Base#save;Article.from_stmt;Article#new;sp_gc_alloc 163

Because frames come from the in-process symbolizer, they're already Ruby (no out-of-process demangle), the stacks are real (no gprof apportioning), and GC/alloc frames (sp_gc_*) self-identify — so the GC-vs-user split that's currently a frame-name heuristic in spinel-dev's tools/perf/spinel-flamegraph.rb becomes exact.

Why now

The substrate is done: #1300 gives stack capture + symbolization; #1335 (wire Kernel#caller) exercises the same path from Ruby; #1334 (symbol map) is the static counterpart. A sampler is "call sp_caller_now() on a timer and count." It turns the perf-analysis work on spinel-dev (#5/#7 — where we found a real Rails workload is allocation-bound at ~55–72% GC, which a flamegraph makes legible) from a gprof-bolt-on into a first-class spinel --profile.

Open design questions (the RFC part)

  • Trigger: SIGPROF + setitimer (simple, single-thread — fits the current cooperative-fiber model) vs a dedicated sampler thread. Async-signal-safety of sp_caller_now() under a handler needs checking (it allocates a sp_StrArray today; a sampler may want a preallocated ring buffer instead).
  • Fiber awareness: attribute samples to the active fiber's logical stack, not just the C stack.
  • Output: folded stacks (renderer-agnostic) as the primitive; SVG is a tool concern.

Adjacent asks this enables / pairs with

  • Allocation profiling hooks — per-type/per-site alloc counts would make the GC decomposition measured rather than inferred from sp_gc_* frame names; directly supports the perf: hoist infer_type(recv) once at top of infer_operator_type #7 tier question (and the Ractor-heap story, OriPekelman/spinel#1333).
  • DWARF variable-location fidelity under --debug — orthogonal, but the other place where the #line build trades away debug-info accuracy (heap locals misread by lldb today; the differential bisector works around it).

Not merge-ready — asking whether a native --profile is wanted, and which trigger/output shape fits the runtime. Happy to build the folded-stacks→flamegraph consumer (it already exists in spinel-dev). Full context: https://github.com/OriPekelman/spinel-dev (issues #5, #7).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions