Skip to content

Show useful local speech crash details#1379

Merged
boudra merged 4 commits into
mainfrom
diagnose-downloads-latest-log
Jun 6, 2026
Merged

Show useful local speech crash details#1379
boudra merged 4 commits into
mainfrom
diagnose-downloads-latest-log

Conversation

@boudra
Copy link
Copy Markdown
Collaborator

@boudra boudra commented Jun 6, 2026

Linked issue

Type of change

  • Bug fix
  • New feature (with prior issue + design alignment)
  • Refactor / code improvement
  • Docs

What does this PR do

Local speech worker failures now leave useful breadcrumbs for both users and maintainers. When the native speech worker exits, the daemon logs the worker pid, exit code or signal, active/pending request context, and a bounded stderr tail; the app-facing error now includes the exit status, the request being handled, and the last stderr line when available.

This also prevents an unobserved session error from throwing out of the worker boundary while the pending connect promise is already carrying the failure.

How did you verify it

  • npx vitest run src/server/speech/providers/local/worker-client.test.ts --bail=1
  • npm run typecheck
  • npm run lint
  • npm run format
  • Pre-commit hook reran staged-file lint, format check, and typecheck successfully.

Checklist

  • One focused change. Unrelated cleanups split out.
  • npm run typecheck passes
  • npm run lint passes
  • npm run format ran (Biome)
  • UI changes include screenshots or video for every affected platform
  • Tests added or updated where it made sense

Risk surface

This changes daemon-side local speech process handling. The main behavior risk is around stderr piping and error propagation for the worker child process; the new regression test covers the crash path and existing tests cover normal TTS, STT, VAD, idle shutdown, and IPC backpressure behavior.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Jun 6, 2026

Greptile Summary

When the native speech worker exits unexpectedly, the daemon now logs the worker PID, exit code/signal, pending request context, and a bounded stderr tail; the user-facing error is enriched with the same context. A WeakSet tracks intentional worker closes so crash logging is suppressed for graceful shutdowns.

  • Switches the exit listener from "exit" to "close", guaranteeing all stderr "data" events have fired before the crash handler reads stderrTail.
  • Adds stderr piping (stdio: "pipe") and a streaming per-line logger alongside a bounded raw tail accumulator, with emitErrorIfObserved preventing unobserved EventEmitter errors from throwing unhandled.
  • Two new test cases verify the crash log structure (using a capturing pino transport) and confirm intentional shutdowns do not produce an error-level log entry.

Confidence Score: 5/5

Safe to merge. The crash-path logic is well-structured and the new tests verify both the error content and the intentional-shutdown discrimination.

The switch from exit to close is correct and tested. The intentional-close WeakSet, bounded stderr accumulator, and enriched error message all work as intended. The two identified gaps are observability nits that do not affect correctness of crash detection or error propagation.

No files require special attention beyond the two suggestions already left.

Important Files Changed

Filename Overview
packages/server/src/server/speech/providers/local/worker-client.ts Core change: switches from exit to close event, adds stderr piping, structured logging on crash/shutdown, and a WeakSet to distinguish intentional closes. The last partial stderr line is never flushed to the per-line logger.warn before the exit record is written.
packages/server/src/server/speech/providers/local/worker-client.test.ts New crash-path and intentional-shutdown tests using a capturing logger; dead emit(exit) left over from the old exit-event path is a harmless but misleading no-op.
packages/server/src/server/speech/providers/local/runtime.ts Trivial one-line addition passing the module-level logger into LocalSpeechWorkerClient, now required by the new constructor option.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["worker 'close' event fires"] --> B{"intentionalWorkerCloses.has(worker)?"}
    B -- yes --> C["logger.info: closed after shutdown"]
    C --> D["null out this.worker if current"]
    D --> Z[return]
    B -- no --> E{wasCurrentWorker?}
    E -- no --> F["logger.warn: Stale worker closed"]
    F --> Z
    E -- yes --> G["buildWorkerExitMessage\ncode + signal + pendingRequests + stderrTail"]
    G --> H["logger.error: Local speech worker exited"]
    H --> I["null out this.worker / workerPid"]
    I --> J[clearIdleTimer]
    J --> K[rejectAllPending]
    K --> L["for each active session emitter"]
    L --> M{"listenerCount('error') > 0?"}
    M -- yes --> N["emitter.emit('error')"]
    M -- no --> O["logger.warn: session error had no listener"]
    N --> P["clear sessions, reset inFlightRequests"]
    O --> P
Loading

Reviews (4): Last reviewed commit: "Avoid false speech worker crash logs on ..." | Re-trigger Greptile

Comment thread packages/server/src/server/speech/providers/local/worker-client.ts
Comment thread packages/server/src/server/speech/providers/local/worker-client.ts Outdated
Comment thread packages/server/src/server/speech/providers/local/worker-client.ts Outdated
Comment thread packages/server/src/server/speech/providers/local/worker-client.ts Outdated
Comment thread packages/server/src/server/speech/providers/local/worker-client.test.ts Outdated
@boudra boudra merged commit 6aa73ba into main Jun 6, 2026
15 checks passed
@boudra boudra deleted the diagnose-downloads-latest-log branch June 6, 2026 13:45
ByteTrue added a commit to ByteTrue/paseo that referenced this pull request Jun 8, 2026
* Fix Windows workspace provider loading (getpaseo#1329)

* test: cover Windows provider snapshot cwd updates

* fix: normalize Windows provider snapshot paths

* fix: preserve Claude history path candidates

* fix: match Claude project paths on Windows

* feat(server): make provider refresh timeout configurable via PASEO_PROVIDER_REFRESH_TIMEOUT_MS (getpaseo#1346)

* feat(server): make provider refresh timeout configurable

Provider refresh probes (isAvailable + listModels/listModes) were hardcoded
to a 30s timeout. With many providers and MCP servers configured, cold
starts of agents like Copilot can exceed this on the first probe, surfacing
'Timed out refreshing Copilot after 30000ms' even though a manual retry
succeeds (warm caches).

Add a PASEO_PROVIDER_REFRESH_TIMEOUT_MS env var so operators can bump the
ceiling without rebuilding. The explicit constructor option still wins
over the env var, and invalid values fall back to the 30s default.

* refactor(server): address review feedback on refresh timeout config

- Use Number() instead of Number.parseInt() so scientific notation like
  '6e4' parses as 60000 instead of being silently truncated to 6.
- Use vi.stubEnv()/vi.unstubAllEnvs() in tests to match the rest of the
  suite and avoid manual process.env save/restore.

(cherry picked from commit dfddda7)

* fix(claude): respect profile models for built-in claude provider (getpaseo#1311)

* fix(claude): respect profile models for built-in claude provider

The built-in Claude provider was the only one in the registry whose
profile `models` array was treated as additive on top of the hardcoded
first-party catalog. Every other built-in provider (and every custom
provider) replaces the runtime model list when `models` is set, which
matches the documented behavior in docs/custom-providers.md.

For users pointing Claude Code at a third-party Anthropic-compatible
gateway (Z.AI, Alibaba/Qwen, MiniMax, custom proxies, …) and curating
the picker with `agents.providers.claude.models`, the old behavior
leaked the nine first-party Claude models into the dropdown, making it
impossible to ship a curated list. This is issue getpaseo#1299.

The fix flips the built-in Claude entry in `provider-registry.ts` to
match the other providers, so `models` replaces the runtime catalog
(including the settings.json-discovered entries surfaced by
getClaudeModelsWithSettings). The existing `additionalModels` field
keeps its additive semantics for anyone who still wants to append
entries on top of the first-party list.

- provider-registry.ts: drop the claude-only `profileModelsAreAdditive`
  branch and align with the rest of the registry.
- provider-registry.test.ts: update the "append to runtime models"
  expectation for built-in Claude to "replace runtime models" and add a
  regression test that mirrors the issue scenario (hardcoded catalog +
  configured profile models, expect only the profile models).
- agent.test.ts: make the "returns hardcoded claude models" hermetic by
  pointing CLAUDE_CONFIG_DIR at an empty temp dir; the test was reading
  the host's real ~/.claude/settings.json (which now contains MiniMax
  env vars from the issue report) and leaking that into the assertion.
- custom-providers.md: correct the note about Claude profile models
  being additive and document the new replace semantics alongside
  additionalModels.

Closes getpaseo#1299

* test(claude): drop explanatory comments from new tests

* refactor(claude): inject configDir into ClaudeAgentClient for test hermeticity

Greptile flagged the previous hermeticity fix on agent.test.ts: mutating
process.env.CLAUDE_CONFIG_DIR inside a test leaks the redirection into
any concurrent test in the same process for the duration of the try
block.

Thread a `configDir` option through ClaudeAgentClient ->
getClaudeModelsWithSettings -> readClaudeSettingsModels ->
resolveClaudeConfigDir instead. The constructor follows the same
injection pattern as `resolveBinary`, so the test passes an empty temp
dir via the option and no shared global state is touched.

- models.ts: add an optional `configDir` to
  getClaudeModelsWithSettings, readClaudeSettingsModels, and
  resolveClaudeConfigDir. Env var remains the fallback for production
  callers.
- agent.ts: add `configDir?` to ClaudeAgentClientOptions, store it on
  the instance, and forward it to getClaudeModelsWithSettings from
  listModels.
- agent.test.ts: drop the process.env save/mutate/restore dance and
  pass `configDir: emptyConfigDir` to the constructor instead.

Refs getpaseo#1311

(cherry picked from commit 1377adb)

* feat(desktop): support multiple windows

Closes getpaseo#250

(cherry picked from commit df635b5)

* Authenticate file downloads with their capability token alone (getpaseo#1351)

The daemon's bearer middleware gated every route except /api/health behind the daemon password (PASEO_PASSWORD), including /api/files/download. That route already carries a single-use, 60s-TTL, crypto-random download token that is only ever issued over the authenticated WebSocket, so the token is the capability for the route.

Requiring the daemon password on top of the token broke browser and Electron downloads: those trigger the download via an anchor navigation, which cannot attach an Authorization header. The cross-platform download store (packages/app/src/stores/download-store.ts) therefore sent no bearer, and the daemon returned 401 whenever a password was configured.

Add /api/files/download to the bearer-auth bypass list. The endpoint still rejects requests without a valid token (400 missing / 403 invalid), so it stays authenticated; it just no longer demands the password a second time. This also fixes every already-released client without an app update.

* Keep virtualenvs out of project picker results (getpaseo#1356)

* Fix project picker virtualenv suggestions

* Ignore more virtualenv names in project picker

(cherry picked from commit 0d1eecc)

* Fix stacked provider settings sheets

(cherry picked from commit dd329a4)

* fix(claude): preserve alwaysLoad on MCP server configs (getpaseo#1333)

* fix(claude): preserve alwaysLoad on MCP server configs

Paseo's Claude provider normalizes McpServerConfig via toClaudeSdkMcpConfig
before passing it to claude-agent-sdk. The function copied type/command/args/
env (stdio) and type/url/headers (http, sse) but dropped the alwaysLoad
field for all three transports.

Without alwaysLoad making it to the SDK, every MCP server's tools were
deferred behind tool search. In practice the agent never discovered them
via search and would respond 'no chrome-devtools tools available' even
with the server fully running and connected.

Add alwaysLoad to the McpServerConfig types and zod schemas, copy it
through in toClaudeSdkMcpConfig for stdio/http/sse, and add unit tests
covering all three transports plus the default-undefined case.

Other providers ignore the field, consistent with the existing pattern
where each provider normalizes only what its CLI/SDK supports.

* test(claude): drop ternary in toClaudeSdkMcpConfig assertion

The 'alwaysLoad' field is now declared on every branch of the
discriminated union returned by toClaudeSdkMcpConfig, so the in-operator
guard is no longer needed. Direct property access expresses the intent
without embedding a conditional in the assertion.

---------

Co-authored-by: Mohamed Boudra <boudra.moha@gmail.com>
(cherry picked from commit 44e9287)

* Remove epic orchestration skills

(cherry picked from commit 59b32ab)

* Fix worktree checkout validation for existing git branch refs (getpaseo#1358)

* fix(server): allow valid existing worktree branch refs (getpaseo#1357)

* Fix existing branch checkout validation

---------

Co-authored-by: Mohamed Boudra <boudra.moha@gmail.com>
(cherry picked from commit d454fa3)

* Allow $schema in config files

(cherry picked from commit d35bffe)

* Default git shipping to pull requests

(cherry picked from commit 20c0335)

* Report daemon auth failures in status

(cherry picked from commit 9b21ccd)

* Open browser links in workspace tabs (getpaseo#1375)

* Open browser window requests as workspace tabs

* Tighten browser new-tab request boundaries

(cherry picked from commit 9a8912b)

* Prepare prompt file attachments for future UI (getpaseo#1376)

* Support file uploads to agents

* Harden prompt file upload handling

* Fix duplicate upload retry handling

* Update relay reconnect tests for queued messages

* Make upload timeout track idle progress

* Clean failed upload directories

(cherry picked from commit db4376d)

* Keep Codex request identity vanilla (getpaseo#1377)

(cherry picked from commit 2822c02)

* Fix auto-archive after merged PR branch deletion (getpaseo#1378)

(cherry picked from commit 0cf1717)

* Show useful local speech crash details (getpaseo#1379)

* Improve local speech worker crash diagnostics

* Fix speech worker stderr tail diagnostics

* Drain speech worker stderr before reporting exit

* Avoid false speech worker crash logs on shutdown

(cherry picked from commit 6aa73ba)

* Fix Pi compaction slash commands (getpaseo#1338)

* fix: handle Pi compact slash command

* fix: handle Pi compaction slash commands

* fix: harden Pi compaction commands

* fix: handle missing Pi autocompaction state

---------

Co-authored-by: X <X@16x.org>
Co-authored-by: Mohamed Boudra <boudra.moha@gmail.com>
(cherry picked from commit 59d48d2)

* Disable pull and push until the branch diverges

(cherry picked from commit 69715a7)

* Persist create-agent mode preferences reliably

(cherry picked from commit d55e162)

* Stabilize local dev daemon homes

Use a checkout-local .dev/paseo-home for root and worktree dev flows so development daemons do not collide with the packaged app home.

Split server, app, and desktop dev entrypoints, seed worktree homes from the source checkout metadata, and keep desktop dev on its own user-data directory and Expo port.

(cherry picked from commit 7c8b290)

* Recover stale supervised daemons

Teach daemon stop to use lifecycle shutdown when the API is reachable even if the owner pid is stale, then wait for the API to disappear and clean the stale pidfile.

Launch desktop-managed daemons detached from desktop stdio, preserve stale reachable daemon ownership for version checks, and allow desktop stop/restart to use the CLI recovery path.

Add supervisor heartbeats so supervised workers shut down when their supervisor disappears instead of surviving as orphaned reachable daemons.

(cherry picked from commit 893e337)

* Keep Paseo MCP config out of agent storage

Runtime-injected MCP endpoints can change with daemon listen settings, so compute them for launches instead of persisting them.

(cherry picked from commit 06a8f95)

* fix: use @ByteTrue scope in new files from upstream sync

* fix: remove stale expo-two-way-audio from lockfile, adapt tests

* fix: use path.join for log file path in desktop daemon test (Windows compat)

* fix: update Nix npmDepsHash for lockfile changes

* fix: fix-lockfile.mjs missing resolved for node_modules/* entries; add heartbeat diagnostics

* chore: apply formatting

* fix(test): always print supervisor logs after disconnect test

* fix(ci): add npm rebuild lightningcss for Linux native module in playwright job

* fix: stabilize daemon-worker supervisor-disconnect test across platforms

- Add heartbeat sequence numbers so workers only accept advancing
  sequences, preventing stale buffered messages from a dead
  supervisor from resetting the heartbeat-expiry timer indefinitely.
- Add reparented-to-init detection for platforms where zombie PIDs
  pass signal-0 checks (macOS with detached spawn).
- Add extreme force-exit (self-SIGKILL after 2s of process.exit
  hanging) as defense-in-depth for beginShutdown.
- Relax the supervisor-disconnect test: wait 15s for the worker to
  self-exit via heartbeat expiry, then force-kill if it hasn't.
  Always print captured logs on failure for CI diagnostics.
- Include supervisor PID in heartbeat messages for traceability.

* fix(ci): resolve Nix build and Playwright lightningcss issues

- fix-lockfile.mjs: use lockfile entry name (e.g. wrap-ansi) instead of
  path-based key (wrap-ansi-cjs) for npm alias entries that lack
  resolved/integrity fields. This fixes Nix prefetch ENOTCACHED errors.
- ci.yml (playwright): explicitly install lightningcss-linux-x64-gnu
  optional dependency on Linux, since the repo lockfile is generated
  on macOS and omits Linux optional deps, causing Metro bundler to
  fail with 'Cannot find module lightningcss.linux-x64-gnu.node'.

* fix(nix): add lightningcss-linux-x64-gnu as optional dep to lockfile

- Add lightningcss-linux-x64-gnu@1.30.1 as optionalDependency in root
  package.json so it appears in npm lockfile with resolved+integrity.
  Without this, the macOS-generated lockfile omits Linux optional deps,
  causing both Nix prefetch (ENOTCACHED) and Metro bundling (Cannot find
  module 'lightningcss.linux-x64-gnu.node') failures on Linux CI.
- Remove redundant explicit npm install in playwright CI step (the dep
  is now in the lockfile and will be installed by npm ci).
- Revert ci.yml duplicate line from earlier incomplete revert.

---------

Co-authored-by: Mohamed Boudra <boudra.moha@gmail.com>
Co-authored-by: Fabian Fischer <nodomain@users.noreply.github.com>
Co-authored-by: Matteo Pietro Dazzi <matteopietro.dazzi@gmail.com>
Co-authored-by: Arielle Ostankova <arieel89@gmail.com>
Co-authored-by: dixonl90 <dixonl90@users.noreply.github.com>
Co-authored-by: chyendongnhanh338 <chyendongnhanh338@gmail.com>
Co-authored-by: X <X@16x.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant