Feature: CLI Status Bar & Token/Cost Tracking — Persistent Context Window Visibility

## Overview

Add a persistent status bar to the Hermes CLI that displays the current model, token usage, context window fullness, session duration, and estimated cost. This is the single highest-impact, lowest-effort CLI improvement available.

Every modern AI coding CLI (Aider, Claude Code, Toad) shows token/cost information. Hermes shows nothing — users have no visibility into how full their context window is, what model they're using, or how much a session costs until they hit a wall (context overflow, unexpected bill).

*Split from #504 for atomicity. The companion feature (rich rendering + navigation) is tracked in #684.*

---

## Research Findings

### What Competitors Show

**Aider**: Real-time token/cost display per interaction. Shows tokens used per file in context.

**Claude Code**: Token counter in the prompt bar. Context window percentage visible at all times. `/usage` command for detailed breakdown.

**Toad CLI**: Model indicator, streaming status, connection health in the status line.

### The Pattern
A thin status bar (1-2 lines) that updates after each interaction. Always visible. Color-coded thresholds for context fullness.

---

## Current State in Hermes Agent

**No status bar exists.** The current prompt_toolkit layout (cli.py ~line 3289) is:
```python
HSplit([
    Window(height=0),        # spacer
    sudo_widget,             # conditional sudo password
    approval_widget,         # conditional approval
    clarify_widget,          # conditional clarify
    spacer,                  # flexible space
    input_rule_top,          # bronze ─ rule
    image_bar,               # clipboard image badges
    input_area,              # TextArea
    input_rule_bot,          # bronze ─ rule
    CompletionsMenu,         # autocomplete
])
```

**Token data IS available** — the OpenAI API response includes a `usage` field with `prompt_tokens`, `completion_tokens`, and `total_tokens`. This data just isn't surfaced to the user.

**Model info IS available** — stored in session state, displayed in the banner at startup, but not persistently visible.

---

## Implementation Plan

### Skill vs. Tool Classification

This is a **core codebase change** to `cli.py`. It requires modifications to the prompt_toolkit layout and a data pipeline from API responses to the status bar display.

### Specification

**Status bar layout** (1 line, above the input area):
```
 ⚕ claude-sonnet-4-20250514 │ Tokens: 12,450 / 200K │ [██████░░░░] 62% │ Cost: $0.23 │ 14m
```

Components:
- **Model name**: current model (truncated if needed)
- **Token count**: prompt + completion tokens / max context
- **Context bar**: visual [██████░░░░] with color coding:
  - Green: < 50%
  - Yellow: 50-80%  
  - Red: > 80%
  - Blinking red: > 95% (imminent overflow)
- **Cost estimate**: cumulative session cost based on per-token pricing
- **Duration**: session elapsed time

**Data pipeline**:
1. After each API call, extract `usage` from response
2. Update a shared state dict (`_status_state`)
3. Trigger `app.invalidate()` to repaint the status bar
4. Cost calculation: `prompt_tokens * input_price + completion_tokens * output_price`

**Model pricing** (configurable in config.yaml):
```yaml
pricing:
  claude-sonnet-4-20250514:
    input: 3.0    # per million tokens
    output: 15.0
  gpt-4o:
    input: 2.5
    output: 10.0
  # fallback for unknown models
  default:
    input: 1.0
    output: 3.0
```

**`/usage` command**: Detailed breakdown:
```
Session Usage Report
━━━━━━━━━━━━━━━━━━━
Model:       claude-sonnet-4-20250514
Duration:    14m 32s
Turns:       7

Tokens:
  Prompt:     10,230 (input)
  Completion:  2,220 (output)
  Total:      12,450
  Context:    62% of 200,000

Cost:
  Input:      $0.031
  Output:     $0.033
  Total:      $0.064
```

### Deliverables

- [ ] Status bar `Window` in prompt_toolkit layout (above input_rule_top)
- [ ] `FormattedTextControl` with dynamic content from `_status_state`
- [ ] Token extraction from API response `usage` field
- [ ] Context window max lookup per model (from model_metadata.py)
- [ ] Color-coded context fullness bar
- [ ] Cost estimation with configurable per-model pricing
- [ ] Session timer (start time → elapsed)
- [ ] `/usage` command with detailed breakdown
- [ ] Compact mode handling (abbreviate or hide status bar in narrow terminals)
- [ ] Tests for token tracking, cost calculation, display formatting

---

## Pros & Cons

### Pros
- Highest impact-to-effort ratio of any CLI improvement
- Prevents surprise context overflow (users see it coming)
- Cost visibility saves money (users learn which prompts are expensive)
- Small, self-contained change (one new Window in the HSplit)
- No refactoring required — purely additive
- Can ship in 1-2 days

### Cons / Risks
- Token counts from server are delayed (reported after completion, not during streaming)
- Cost estimation requires maintaining a pricing table (could go stale)
- Status bar takes 1 line of vertical space (mitigated by compact mode)
- Some terminals may render the color bar poorly (test mosh, tmux, SSH)

---

## Open Questions

- Should the status bar be above the input (like vim's statusline) or between output and input?
- Should cost tracking be opt-in (some users might find it anxiety-inducing)?
- Should we use tiktoken for real-time token estimation DURING streaming, or only report after completion?
- Should token counts include tool call tokens (they can be substantial)?

---

## References

- [Aider cost tracking](https://aider.chat/) — per-interaction token/cost display
- [Claude Code context management](https://docs.anthropic.com/en/docs/claude-code) — token counter in prompt
- Existing code: cli.py layout (~line 3289) — insertion point for status bar
- Existing code: model_metadata.py — context window sizes per model
- Existing code: FormattedTextControl pattern used by clarify_widget — reusable for status bar
- Split from closed #504
- Related: #480 (Context Condensation — status bar shows when condensation is imminent)
- Related: #464 (blinking cursor — status bar changes should avoid exacerbating this)
- Subsumes: #636 (async loading indicator — status bar can show "processing..." state)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: CLI Status Bar & Token/Cost Tracking — Persistent Context Window Visibility #683

Overview

Research Findings

What Competitors Show

The Pattern

Current State in Hermes Agent

Implementation Plan

Skill vs. Tool Classification

Specification

Deliverables

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature: CLI Status Bar & Token/Cost Tracking — Persistent Context Window Visibility #683

Description

Overview

Research Findings

What Competitors Show

The Pattern

Current State in Hermes Agent

Implementation Plan

Skill vs. Tool Classification

Specification

Deliverables

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions