perf(session-parser): read each session file once in extractCostlyPrompts#66
Merged
alexgreensh merged 1 commit intoJun 16, 2026
Conversation
…mpts extractCostlyPrompts read the same JSONL file twice per call: once inside parseSessionTurns and once more to walk user-message text. Extract a shared parseSessionTurnsFromLines core so the function reads and splits the file once, then derives both per-turn cost data and user text from the same lines. Output is unchanged (verified byte-identical on a fixture containing sidechain and tool_result-only messages). This halves file I/O for the costly-prompt scan and removes a time-of-check window between the two former reads.
Owner
|
Merged in v5.11.13 — single read plus the turn-pairing-stability bonus. Thanks, Dani. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
extractCostlyPrompts()read the same session.jsonlfile from disk twice on every call: once insideparseSessionTurns()(which it calls for per-turn cost data) and once more directly, to walk the file for user-message text. Session transcripts can be multiple megabytes, and the audit / dashboard path callsextractCostlyPrompts()once per costly session, so each call paid for two fullreadFileSync+ twosplit("\n")passes over the same bytes.What
extractCostlyPrompts()now reads and splits the file once and reuses the resulting lines for both the per-turn parse and the user-text walk. Net effect per call: onestatSync+ onereadFileSync+ onesplitinstead of two of each. As a side benefit, pairing user text to turns is now immune to the file changing between the two former reads (a small time-of-check window is gone).How
parseSessionTurnsFromLines(lines, openclawDir)core holding the existing turn-parsing logic verbatim.parseSessionTurns(filePath, ...)now reads + splits, then delegates to that core. Its public signature and behavior are unchanged.extractCostlyPrompts()reads + splits once, calls the core for turns, then walks the samelinesarray for user text. Because both walks operate on the identicallines, the turn-index pairing stays exactly aligned (the constraint already flagged by the in-code comment).Validation
npm run build(tsc) passes; compileddist/regenerated and committed in lockstep withsrc(repo convention)..d.tssignature change; the new helper is private.Scope
Deliberately narrow. Left for a possible follow-up: the broader audit pipeline still reads each file independently via
parseSession,extractCostlyPrompts, andloadMessagesFromSessionFile; and thestatSync/ read / 50MB-guard block is duplicated across the two functions.