fix(polling): Prevent hanging providers from permanently blocking background usage refresh by cnovak · Pull Request #414 · steipete/CodexBar

cnovak · 2026-02-22T18:45:24Z

Summary

This PR prevents hanging usage providers from permanently blocking the background usage refresh loop.

What was happening

The background usage poller could permanently freeze if a provider request or subprocess blocked indefinitely (e.g., due to a network blackhole or a hanging CLI command).
This caused CodexBar to stop updating usage data entirely until restarted.

Root cause

The UsageStore polling loop lacked a global timeout safety net.
UsageStore+Refresh awaited provider fetchOutcome() indefinitely.
SubprocessRunner pipes were awaited before being closed, meaning inherited file handles from zombie child processes could block stdout/stderr reads safely.

What changed

Global Polling Timeout

Added a 60-second task group timeout to the background polling loop in UsageStore.swift.

Per-Provider Refresh Timeout

Added a 30-second task group timeout per provider in UsageStore+Refresh.swift.

Subprocess Hardening

Improved SubprocessRunner.swift with guaranteed execution of cleanup via a defer block.
Implemented aggressive SIGKILL enforcement to murder processes resisting SIGTERM.
Explicitly closed stdout/stderr pipes before awaiting their read tasks to unblock hanging readToEnd() calls.

Before / After

Before

Permanent background thread hang if Antigravity stalled, or if network requests were blackholed.

After

The system gracefully recovers and logs a warning if any provider takes longer than 30 seconds.
Background polling loop cleanly restarts even if a fetch hangs.

Validation

Monitored background polling logs ensuring hanging fetching operations correctly encounter SubprocessRunnerError.timedOut.
Zombie processes correctly terminated via defer cleanup loops.

Notes

Fixes #189

…kground usage refresh This aggregates three related safety valves to address instances of permanent hangs in usage polling: 1. Adds a 60-second global timeout to the background polling loop in `UsageStore.swift` 2. Adds a 30-second per-provider timeout in `UsageStore+Refresh.swift` 3. Hardens `SubprocessRunner.swift` with improved pipe management, task cancellation, and a more aggressive SIGKILL enforcement mechanism to prevent zombie processes. Specifically, it explicitly closes stdout/stderr pipes before awaiting reading tasks so that stray inherited file handles do not block reads indefinitely. Fixes steipete#189

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 37eadee216

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Sources/CodexBar/UsageStore+Refresh.swift

Using `try?` on `Task.sleep` swallows the CancellationError. When the task group is cancelled upon a successful provider fetch, the timeout task would continue to the warning path and falsely report a timeout. Using a do-catch block correctly returns early on cancellation, preventing unreliable hang diagnostics.

ratulsarna · 2026-02-23T13:38:25Z

Sources/CodexBar/UsageStore+Refresh.swift

+                return nil
+            }
+            let first = await group.next()
+            group.cancelAll()


Could we double-check whether canceling the task group here guarantees this method returns in ~30s when a provider fetch is stuck in non-cooperative work?

ratulsarna · 2026-02-23T13:38:33Z

Sources/CodexBar/UsageStore+Refresh.swift

+                return outcome
+            } else {
+                return ProviderFetchOutcome(
+                    result: .failure(SubprocessRunnerError.timedOut("\(provider.rawValue) fetch")),


When we create this timeout failure, how are we distinguishing a true timeout from parent-task cancellation?

ratulsarna · 2026-02-23T13:38:42Z

Sources/CodexBar/UsageStore.swift

+                        throw SubprocessRunnerError.timedOut("global refresh")
+                    }
+                    _ = try await group.next()
+                    group.cancelAll()


Do we know this cancellation path always lets the timer loop move forward, even if refresh work does not respond to cancellation quickly?

ratulsarna · 2026-02-23T13:38:49Z

Sources/CodexBarCore/Host/Process/SubprocessRunner.swift

+        // readToEnd() can block indefinitely if the underlying process is dead but the pipe is still "open" 
+        // in a zombie state or if a child process inherited it. Closing the handle explicitly triggers EOF 
+        // in the reading task, allowing stdoutTask.value to complete.
+        try? stdoutPipe.fileHandleForReading.close()


Is there any chance closing the read handle here could race the reader task and cause us to miss some stdout/stderr data?

chatgpt-codex-connector bot reviewed Feb 22, 2026

View reviewed changes

Sources/CodexBar/UsageStore+Refresh.swift Outdated Show resolved Hide resolved

cnovak mentioned this pull request Feb 22, 2026

Antigravity is not refreshing after a few hours #189

Open

ratulsarna reviewed Feb 23, 2026

View reviewed changes

ratulsarna added the changes requested label Feb 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

fix(polling): Prevent hanging providers from permanently blocking background usage refresh#414

fix(polling): Prevent hanging providers from permanently blocking background usage refresh#414
cnovak wants to merge 2 commits intosteipete:mainfrom
cnovak:fix/polling-timeout

cnovak commented Feb 22, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

ratulsarna Feb 23, 2026

Uh oh!

ratulsarna Feb 23, 2026

Uh oh!

ratulsarna Feb 23, 2026

Uh oh!

ratulsarna Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

cnovak commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What was happening

Root cause

What changed

Before / After

Before

After

Validation

Notes

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

ratulsarna Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

ratulsarna Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

ratulsarna Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

ratulsarna Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cnovak commented Feb 22, 2026 •

edited

Loading