Skip to content

Fix daemon stability and dashboard timing accuracy#13

Open
jcjc81 wants to merge 15 commits intoaniketkarne:mainfrom
jcjc81:fix/daemon-graceful-shutdown-and-stability
Open

Fix daemon stability and dashboard timing accuracy#13
jcjc81 wants to merge 15 commits intoaniketkarne:mainfrom
jcjc81:fix/daemon-graceful-shutdown-and-stability

Conversation

@jcjc81
Copy link
Copy Markdown

@jcjc81 jcjc81 commented Feb 8, 2026

Summary

This PR resolves critical daemon stability issues and fixes dashboard timing inaccuracies, making the auto-renewal system more reliable and accurate.

Changes

1. Daemon Crash Fix and Graceful Shutdown

  • Fixed daemon crash on startup caused by log messages polluting command substitution
  • Enabled graceful shutdown by replacing long sleep with incremental sleeps
  • Daemon now stops within 5 seconds instead of requiring kill -9

2. Event-Driven Shutdown (Instant Response)

  • Replaced polling loop with background sleep + wait pattern
  • Reduced shutdown response time from 0-5 seconds to <50ms
  • Eliminated CPU overhead (0 wake cycles vs 120 per 10-minute sleep)
  • Prevents zombie processes with proper cleanup

3. Dashboard Timing Accuracy

  • Fixed 3-hour timing discrepancy in dashboard display
  • Created shared library (lib/ccusage-utils.sh) for ccusage query logic
  • Dashboard now queries ccusage directly instead of using stale activity file
  • Added timing source transparency (shows "ccusage" vs "clock-based")
  • Graceful fallback when ccusage unavailable

Test Plan

  • Daemon starts successfully without crashes
  • Daemon stops gracefully in <1 second
  • Dashboard timing matches ccusage blocks output exactly
  • No more 3-hour timing discrepancies
  • Timing source indicator displays correctly
  • Graceful fallback to clock-based timing works

Impact

  • Reliability: Daemon no longer crashes on startup
  • Responsiveness: Instant shutdown response improves user experience
  • Accuracy: Dashboard shows correct timing matching ccusage output
  • Transparency: Users can see which timing source is being used

🤖 Generated with Claude Code

Jason Chin and others added 3 commits February 8, 2026 09:42
This commit fixes two critical bugs in the auto-renewal daemon:

1. **Daemon crash on startup**: Fixed log_message() function that was
   using 'tee' which output to both stdout and log file. When functions
   like get_minutes_until_reset() captured output via command
   substitution, log messages were included in variables, causing bash
   to fail integer comparisons and crash the daemon.

2. **Graceful shutdown failure**: Replaced single long sleep with
   5-second incremental sleeps to allow trap handlers to respond to
   SIGTERM signals quickly. Previously, the daemon would ignore stop
   requests until the sleep completed (up to 10 minutes), forcing
   kill -9.

Tested: Daemon now starts successfully, monitors ccusage blocks, and
stops gracefully within 5 seconds.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Replace 5-second polling loop with background sleep + wait pattern for
truly event-driven signal handling. Daemon now responds to SIGTERM in
< 1 second (vs 0-5 second delay) with zero polling overhead.

Changes:
- Add SLEEP_PID global variable to track background sleep process
- Update cleanup() to kill sleep process and prevent zombie processes
- Replace polling while loop with: sleep & wait pattern

Benefits:
- Instant shutdown response (< 50ms vs 0-5000ms)
- Zero CPU overhead (0 wake cycles vs 120 per 10-minute sleep)
- Industry-standard pattern used by systemd, docker, etc.
- Better power efficiency on battery systems

Tested:
- Immediate shutdown during short and long sleeps
- No zombie processes or orphaned PIDs
- Stress tested with 5 rapid start/stop cycles
- All shutdown messages logged correctly

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The dashboard was showing incorrect session reset times (off by ~3 hours)
because it relied on a stale .claude-last-activity file instead of querying
ccusage directly. When users manually started Claude sessions, the activity
file remained outdated, causing timing calculations based on old timestamps.

Changes:
- Created shared library (lib/ccusage-utils.sh) with ccusage query logic
- Removed duplicate ccusage functions from daemon script
- Updated manager to query ccusage directly for accurate timing
- Added timing source transparency (shows "ccusage" vs "clock-based")
- Implemented daemon config tracking via ~/.claude-auto-renew-daemon-config

The dashboard now shows accurate timing matching `ccusage blocks` output,
with graceful fallback to clock-based calculation when ccusage unavailable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jcjc81 jcjc81 changed the title fix: resolve daemon crash and enable graceful shutdown Fix daemon stability and dashboard timing accuracy Feb 16, 2026
Jason Chin and others added 12 commits February 16, 2026 20:58
The daemon was failing to start Claude sessions when run from within
an existing Claude Code session due to nested session protection.
This fix unsets the CLAUDECODE environment variable before launching
claude commands, allowing renewals to work properly even when the
daemon is managed from an active Claude session.

Fixes the "Claude Code cannot be launched inside another Claude Code
session" error that was causing continuous renewal failures.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add verify_session_active() to validate sessions via ccusage JSON API
- Switch get_minutes_until_reset() from text parsing to JSON parsing using jq
- Create persistent sessions with sleep 18000 to keep stdin open for 5 hours
- Implement retry loop with exponential backoff (30s->60s->120s->300s, max 20 attempts)
- Create session once, then verify multiple times (prevents duplicate sessions)
- Accept any active session with >60 min remaining (not just fresh 5-hour sessions)
- Add detailed logging for verification attempts and failure reasons
- Detect existing active sessions and skip renewal when not needed

This fixes the issue where renewals created ephemeral sessions that closed
immediately instead of maintaining persistent 5-hour windows. The daemon now
uses real-time API data from ccusage blocks JSON to verify sessions are active
and only creates new sessions when truly needed.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
When a new billing block starts, ccusage needs burn rate data before
it can compute projection.remainingMinutes. Fall back to calculating
remaining time from the known endTime field to avoid incorrectly
dropping to clock-based estimation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously the daemon triggered a Claude session in the last 2 minutes
of the dying block, causing verification to always fail:
  - Old block had <60 min left → "session not fresh enough"
  - At the hour boundary → gap where no block is active → "no data"

New approach: when reset is imminent, wait until the old block fully
expires plus 60 seconds into the new block, then create the session.
The session tokens land in a fresh 5-hour window and verification
succeeds on the first attempt (~299 min remaining).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The sleep 18000 approach kept stdin open, causing Claude to buffer
its session data and delay writing the JSONL to disk by ~13 minutes.
Since ccusage reads JSONL files for block detection, this meant all
verification attempts during that window returned "no timing data".

Ephemeral sessions (echo | claude) close cleanly on EOF, triggering
an immediate JSONL write. ccusage detects the new block right away
and verification succeeds on the first attempt.

The 5-hour window is determined by API call timestamps, not by
whether a session connection remains open.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Daemon now queries ccusage once for block endTime, sleeps precisely
  until endTime + 5 min (top of hour + buffer), then renews
- Removes all polling logic (10min/2min/30sec intervals)
- Removes clock-based fallback; ccusage unavailability retries every 5 min
- Fresh start with no active block renews immediately
- Daemon writes block endTime to state file (~/.claude-auto-renew-state)
- Dashboard reads state file instead of calling ccusage on each refresh
- Removes --disableccusage flag (no longer meaningful without fallback)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s to 5

- Pin last renewal model above recent activity in dashboard
- Expand recent activity tail from 5 to 10 lines
- Reduce max verification attempts from 20 to 5

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Check for any active billing block before sending renewal message.
Uses get_block_end_epoch (isActive == true, no time threshold) so any
active session — regardless of remaining time — prevents unnecessary renewal.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fix 1 - Renew on start/restart:
- Manager touches ~/.claude-auto-renew-renew-on-start before launching daemon
- Daemon skips scheduling and active-block pre-check on first active iteration
- cleanup() removes marker file on graceful shutdown
- Covers both start and restart commands

Fix 2 - Weekly limit detection and smart sleep:
- start_claude_session() captures claude output instead of raw pipe to log
- Each output line logged with timestamp prefix for clean audit trail
- New parse_limit_reset_epoch() detects "hit your limit" message and parses
  reset time in detected timezone; handles "12pm", "Monday 12pm", "Mar 10 12pm"
  and any other future date/time format via GNU date fallback candidates
- LIMIT_RESET_EPOCH persisted to ~/.claude-auto-renew-limit-reset so it
  survives daemon crash/SIGKILL; cleared on graceful stop
- Daemon sleeps (interruptibly) until 5 min past reset time
- Falls back to 1-hour retry if reset time cannot be parsed
- Successful renewal clears the limit reset file

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude uses the same message format for both daily and weekly limits.
Remove the "weekly" assumption — just report it as a usage limit and
let the reset time speak for itself.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant