Skip to content

feat(run-history): RunEntry v2 (model/cwd/tool_calls/error_excerpt) + lock-free rotation; extend BuiltinAgentOverrideConfig (output/defaultReads/defaultProgress)#231

Open
JayceFreeman wants to merge 2 commits into
nicobailon:mainfrom
JayceFreeman:feat/run-history-v2-overrides
Open

feat(run-history): RunEntry v2 (model/cwd/tool_calls/error_excerpt) + lock-free rotation; extend BuiltinAgentOverrideConfig (output/defaultReads/defaultProgress)#231
JayceFreeman wants to merge 2 commits into
nicobailon:mainfrom
JayceFreeman:feat/run-history-v2-overrides

Conversation

@JayceFreeman
Copy link
Copy Markdown

Two related changes on the same agents.ts / run-history.ts surface. 477/477 unit tests pass (incl. the new ones).

RunEntry v2 + lock-free rotation

RunEntry gains optional model / cwd / tool_calls / error_excerpt, sourced at each of the 4 recordRun call sites from the values actually used to run the child — never the parent ctx.cwd:

field source
model SingleResult.model
cwd the childs executed cwd — taskCwd (parallel chain), stepCwd (sequential chain), resolveParallelTaskCwd(...) (parallel path), effectiveCwd (single path)
tool_calls progressSummary.toolCount (preserves 0; not dropped by truthiness)
error_excerpt SingleResult.error, truncated to 300 chars, only on error

Rotation is now lock-free. The old read-path trim raced under concurrent processes. New design: recordRun appends one JSON line via O_APPEND (kernel-atomic across processes on a local POSIX fs; entries are small → single write()), then a best-effort maybeRotate trims to ROTATE_KEEP by writing a per-process-unique temp and atomic-renaming it into place. The file is a complete valid snapshot either way — no duplication, no corruption. The only loss this admits is an append landing inside a rotates read→rename window; acceptable for telemetry. No new dependency, no lock file. A 16-process × 150-write storm test guards no-corruption / no-double-write / bounded-size.

BuiltinAgentOverrideConfig: output / defaultReads / defaultProgress

These three fields are honored on AgentConfig but were silently dropped by the override schema, parser, applier, and the write path — so a settings.json override setting them was a no-op (a lying contract). Now threaded through all of: schema (BuiltinAgentOverrideBase + Config), parseBuiltinOverrideEntry (typed validation errors), applyBuiltinOverride (the false sentinel clears the builtin default), and the save path (cloneOverrideBase / cloneOverrideValue / buildBuiltinOverrideConfig) so values round-trip through save+reload rather than being dropped on persist.

Tests

+6 run-history unit tests (incl. multi-process storm) and +7 agent-overrides tests (apply, false-sentinel clear, malformed-value rejection, builder round-trip, save+reload persistence).

… lock-free rotation; extend BuiltinAgentOverrideConfig (output/defaultReads/defaultProgress)

Two related changes on the same agents.ts / run-history.ts surface.

## RunEntry v2 + lock-free rotation (run-history.ts + 4 call sites)

RunEntry gains optional model / cwd / tool_calls / error_excerpt so the
history is useful for telemetry, not just agent/status/duration. Sourced at
each of the 4 recordRun call sites from the values actually used to run the
child:
  - model        <- SingleResult.model
  - cwd          <- the child's executed cwd (taskCwd / stepCwd /
                     resolveParallelTaskCwd / effectiveCwd), never the
                     parent ctx.cwd
  - tool_calls   <- progressSummary.toolCount (preserves 0; not dropped by
                     truthiness)
  - error_excerpt<- SingleResult.error, truncated to 300 chars, only on error

Rotation is now lock-free. The previous read-path trim raced under
concurrent processes. New design: recordRun appends one JSON line via
O_APPEND (kernel-atomic across processes on a local POSIX fs; entries are
small, single write()), then best-effort maybeRotate trims to ROTATE_KEEP
by writing a per-process-unique temp and atomic-renaming it into place. The
file is a complete valid snapshot either way — no duplication, no
corruption. The only loss this admits is an append landing inside a
rotate's read->rename window; acceptable for telemetry. No new dependency,
no lock file. A 16-process x 150-write storm test guards no-corruption,
no-double-write, and bounded size.

## BuiltinAgentOverrideConfig: output / defaultReads / defaultProgress

These three fields were honored on AgentConfig but silently dropped by the
override schema, parser, applier, AND the write path — so a settings.json
override that set them was a no-op (a lying contract). Threaded through all
of: the schema (BuiltinAgentOverrideBase + Config), parseBuiltinOverrideEntry
(with typed validation errors), applyBuiltinOverride (false sentinel clears
the builtin default), and the save path (cloneOverrideBase / cloneOverrideValue
/ buildBuiltinOverrideConfig) so values round-trip through save+reload instead
of being dropped on persist.

Tests: +6 run-history unit tests (incl. multi-process storm) and +7
agent-overrides tests (apply, false-sentinel clear, malformed-value
rejection, builder round-trip, save+reload persistence). Full suite: 477/477.
Copilot AI review requested due to automatic review settings May 29, 2026 03:40
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR extends run history logging with additional v2 metadata (model/cwd/tool calls/error excerpt), moves log rotation to the write path with a lock-free/atomic-rename strategy, and expands builtin agent overrides to support output/defaultReads/defaultProgress.

Changes:

  • Add v2 fields to RunEntry and pass them through from foreground execution paths.
  • Implement best-effort log rotation on write (append + optional trim/rename) and adjust loading logic accordingly.
  • Add support (and tests) for overriding builtin agent output, defaultReads, and defaultProgress, including false sentinels for clearing defaults.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
test/unit/run-history.test.ts Adds unit coverage for v2 fields, rotation behavior, and multi-process concurrency invariants.
test/unit/agent-overrides.test.ts Adds tests for new builtin override fields and persistence behavior.
src/runs/shared/run-history.ts Extends run history schema, adds write-path rotation, and changes read-path parsing.
src/runs/foreground/subagent-executor.ts Passes model/cwd/tool_calls/error_excerpt into recordRun for subagent runs.
src/runs/foreground/chain-execution.ts Passes model/cwd/tool_calls/error_excerpt into recordRun for chain steps; minor cwd refactor.
src/agents/agents.ts Adds parsing/applying/building/saving support for output/defaultReads/defaultProgress overrides.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +31 to +37
function maybeRotate(historyPath: string): void {
const lines = fs.readFileSync(historyPath, "utf-8").split("\n").filter((l) => l.trim().length > 0);
if (lines.length <= ROTATE_READ_THRESHOLD) return;
const tmp = `${historyPath}.${process.pid}.${Date.now()}.tmp`;
fs.writeFileSync(tmp, `${lines.slice(-ROTATE_KEEP).join("\n")}\n`);
fs.renameSync(tmp, historyPath);
}
Comment on lines +83 to +93
return raw
.split("\n")
.map((line) => line.trim())
.filter((line) => line.length > 0)
.map((line) => {
try {
return JSON.parse(line) as RunEntry;
} catch {
return undefined;
}
})
Comment thread test/unit/run-history.test.ts Outdated
`);

const procs = Array.from({ length: PROCS }, (_, k) => new Promise<void>((resolve, reject) => {
const p = spawn(process.execPath, ["--experimental-strip-types", childFile, pathToFileURL(SRC).href, dir, `p${k}`, String(PER_PROC)], { stdio: "ignore" });
The storm test spawned children with a hardcoded `--experimental-strip-types`.
That drifts if Node renames the flag and diverges from the integration runner
(`--experimental-transform-types --import ...`). Inheriting all of execArgv was
rejected: it would leak `--test*`/`--inspect*`/profiler flags into 16 children
(orphaned space-form values, debugger-port collisions, silent hangs).

Instead, whitelist the TS-loader flags from the parent's execArgv (both
`--flag=value` and space-form `--flag value`); unrecognized tokens are simply
not copied, so test-runner/debugger/profiler flags and their orphaned values
can never reach the child. Byte-identical to the old behavior on the current
unit runner; robust if the TS mechanism changes. Full unit suite 477/477.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants