Skip to content

perf(session-parser): read each session file once in extractCostlyPrompts#66

Merged
alexgreensh merged 1 commit into
alexgreensh:mainfrom
danikdanik:perf/dedupe-costly-prompts-read
Jun 16, 2026
Merged

perf(session-parser): read each session file once in extractCostlyPrompts#66
alexgreensh merged 1 commit into
alexgreensh:mainfrom
danikdanik:perf/dedupe-costly-prompts-read

Conversation

@danikdanik

Copy link
Copy Markdown
Contributor

Why

extractCostlyPrompts() read the same session .jsonl file from disk twice on every call: once inside parseSessionTurns() (which it calls for per-turn cost data) and once more directly, to walk the file for user-message text. Session transcripts can be multiple megabytes, and the audit / dashboard path calls extractCostlyPrompts() once per costly session, so each call paid for two full readFileSync + two split("\n") passes over the same bytes.

What

extractCostlyPrompts() now reads and splits the file once and reuses the resulting lines for both the per-turn parse and the user-text walk. Net effect per call: one statSync + one readFileSync + one split instead of two of each. As a side benefit, pairing user text to turns is now immune to the file changing between the two former reads (a small time-of-check window is gone).

How

  • Extracted a private parseSessionTurnsFromLines(lines, openclawDir) core holding the existing turn-parsing logic verbatim.
  • parseSessionTurns(filePath, ...) now reads + splits, then delegates to that core. Its public signature and behavior are unchanged.
  • extractCostlyPrompts() reads + splits once, calls the core for turns, then walks the same lines array for user text. Because both walks operate on the identical lines, the turn-index pairing stays exactly aligned (the constraint already flagged by the in-code comment).

Validation

  • npm run build (tsc) passes; compiled dist/ regenerated and committed in lockstep with src (repo convention).
  • Behavior verified byte-identical before/after on a fixture exercising normal turns, a sidechain message, a tool_result-only user message, and a trailing user message (the turn-index-alignment-sensitive cases).
  • Public API surface unchanged: no .d.ts signature change; the new helper is private.

Scope

Deliberately narrow. Left for a possible follow-up: the broader audit pipeline still reads each file independently via parseSession, extractCostlyPrompts, and loadMessagesFromSessionFile; and the statSync / read / 50MB-guard block is duplicated across the two functions.

…mpts

extractCostlyPrompts read the same JSONL file twice per call: once inside
parseSessionTurns and once more to walk user-message text. Extract a shared
parseSessionTurnsFromLines core so the function reads and splits the file once,
then derives both per-turn cost data and user text from the same lines.

Output is unchanged (verified byte-identical on a fixture containing sidechain
and tool_result-only messages). This halves file I/O for the costly-prompt scan
and removes a time-of-check window between the two former reads.
@alexgreensh alexgreensh merged commit 1482a23 into alexgreensh:main Jun 16, 2026
1 check passed
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 16, 2026
@alexgreensh

Copy link
Copy Markdown
Owner

Merged in v5.11.13 — single read plus the turn-pairing-stability bonus. Thanks, Dani.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants