-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Overview
Add a persistent status bar to the Hermes CLI that displays the current model, token usage, context window fullness, session duration, and estimated cost. This is the single highest-impact, lowest-effort CLI improvement available.
Every modern AI coding CLI (Aider, Claude Code, Toad) shows token/cost information. Hermes shows nothing — users have no visibility into how full their context window is, what model they're using, or how much a session costs until they hit a wall (context overflow, unexpected bill).
Split from #504 for atomicity. The companion feature (rich rendering + navigation) is tracked in #684.
Research Findings
What Competitors Show
Aider: Real-time token/cost display per interaction. Shows tokens used per file in context.
Claude Code: Token counter in the prompt bar. Context window percentage visible at all times. /usage command for detailed breakdown.
Toad CLI: Model indicator, streaming status, connection health in the status line.
The Pattern
A thin status bar (1-2 lines) that updates after each interaction. Always visible. Color-coded thresholds for context fullness.
Current State in Hermes Agent
No status bar exists. The current prompt_toolkit layout (cli.py ~line 3289) is:
HSplit([
Window(height=0), # spacer
sudo_widget, # conditional sudo password
approval_widget, # conditional approval
clarify_widget, # conditional clarify
spacer, # flexible space
input_rule_top, # bronze ─ rule
image_bar, # clipboard image badges
input_area, # TextArea
input_rule_bot, # bronze ─ rule
CompletionsMenu, # autocomplete
])Token data IS available — the OpenAI API response includes a usage field with prompt_tokens, completion_tokens, and total_tokens. This data just isn't surfaced to the user.
Model info IS available — stored in session state, displayed in the banner at startup, but not persistently visible.
Implementation Plan
Skill vs. Tool Classification
This is a core codebase change to cli.py. It requires modifications to the prompt_toolkit layout and a data pipeline from API responses to the status bar display.
Specification
Status bar layout (1 line, above the input area):
⚕ claude-sonnet-4-20250514 │ Tokens: 12,450 / 200K │ [██████░░░░] 62% │ Cost: $0.23 │ 14m
Components:
- Model name: current model (truncated if needed)
- Token count: prompt + completion tokens / max context
- Context bar: visual [██████░░░░] with color coding:
- Green: < 50%
- Yellow: 50-80%
- Red: > 80%
- Blinking red: > 95% (imminent overflow)
- Cost estimate: cumulative session cost based on per-token pricing
- Duration: session elapsed time
Data pipeline:
- After each API call, extract
usagefrom response - Update a shared state dict (
_status_state) - Trigger
app.invalidate()to repaint the status bar - Cost calculation:
prompt_tokens * input_price + completion_tokens * output_price
Model pricing (configurable in config.yaml):
pricing:
claude-sonnet-4-20250514:
input: 3.0 # per million tokens
output: 15.0
gpt-4o:
input: 2.5
output: 10.0
# fallback for unknown models
default:
input: 1.0
output: 3.0/usage command: Detailed breakdown:
Session Usage Report
━━━━━━━━━━━━━━━━━━━
Model: claude-sonnet-4-20250514
Duration: 14m 32s
Turns: 7
Tokens:
Prompt: 10,230 (input)
Completion: 2,220 (output)
Total: 12,450
Context: 62% of 200,000
Cost:
Input: $0.031
Output: $0.033
Total: $0.064
Deliverables
- Status bar
Windowin prompt_toolkit layout (above input_rule_top) -
FormattedTextControlwith dynamic content from_status_state - Token extraction from API response
usagefield - Context window max lookup per model (from model_metadata.py)
- Color-coded context fullness bar
- Cost estimation with configurable per-model pricing
- Session timer (start time → elapsed)
-
/usagecommand with detailed breakdown - Compact mode handling (abbreviate or hide status bar in narrow terminals)
- Tests for token tracking, cost calculation, display formatting
Pros & Cons
Pros
- Highest impact-to-effort ratio of any CLI improvement
- Prevents surprise context overflow (users see it coming)
- Cost visibility saves money (users learn which prompts are expensive)
- Small, self-contained change (one new Window in the HSplit)
- No refactoring required — purely additive
- Can ship in 1-2 days
Cons / Risks
- Token counts from server are delayed (reported after completion, not during streaming)
- Cost estimation requires maintaining a pricing table (could go stale)
- Status bar takes 1 line of vertical space (mitigated by compact mode)
- Some terminals may render the color bar poorly (test mosh, tmux, SSH)
Open Questions
- Should the status bar be above the input (like vim's statusline) or between output and input?
- Should cost tracking be opt-in (some users might find it anxiety-inducing)?
- Should we use tiktoken for real-time token estimation DURING streaming, or only report after completion?
- Should token counts include tool call tokens (they can be substantial)?
References
- Aider cost tracking — per-interaction token/cost display
- Claude Code context management — token counter in prompt
- Existing code: cli.py layout (~line 3289) — insertion point for status bar
- Existing code: model_metadata.py — context window sizes per model
- Existing code: FormattedTextControl pattern used by clarify_widget — reusable for status bar
- Split from closed Feature: Enhanced CLI TUI — Rich Markdown Rendering, Diff Previews, Token Tracking & Navigable Output #504
- Related: Feature: LLM-Based Context Condensation for Long Sessions #480 (Context Condensation — status bar shows when condensation is imminent)
- Related: [Bug]: Blinking cursor or switching emojis causing prompt lines to flash repeatedly in terminal #464 (blinking cursor — status bar changes should avoid exacerbating this)
- Subsumes: [UX]: Async slash commands (/skills search, etc.) need a loading indicator #636 (async loading indicator — status bar can show "processing..." state)