Skip to content

Feature: CLI Status Bar & Token/Cost Tracking — Persistent Context Window Visibility #683

@teknium1

Description

@teknium1

Overview

Add a persistent status bar to the Hermes CLI that displays the current model, token usage, context window fullness, session duration, and estimated cost. This is the single highest-impact, lowest-effort CLI improvement available.

Every modern AI coding CLI (Aider, Claude Code, Toad) shows token/cost information. Hermes shows nothing — users have no visibility into how full their context window is, what model they're using, or how much a session costs until they hit a wall (context overflow, unexpected bill).

Split from #504 for atomicity. The companion feature (rich rendering + navigation) is tracked in #684.


Research Findings

What Competitors Show

Aider: Real-time token/cost display per interaction. Shows tokens used per file in context.

Claude Code: Token counter in the prompt bar. Context window percentage visible at all times. /usage command for detailed breakdown.

Toad CLI: Model indicator, streaming status, connection health in the status line.

The Pattern

A thin status bar (1-2 lines) that updates after each interaction. Always visible. Color-coded thresholds for context fullness.


Current State in Hermes Agent

No status bar exists. The current prompt_toolkit layout (cli.py ~line 3289) is:

HSplit([
    Window(height=0),        # spacer
    sudo_widget,             # conditional sudo password
    approval_widget,         # conditional approval
    clarify_widget,          # conditional clarify
    spacer,                  # flexible space
    input_rule_top,          # bronze ─ rule
    image_bar,               # clipboard image badges
    input_area,              # TextArea
    input_rule_bot,          # bronze ─ rule
    CompletionsMenu,         # autocomplete
])

Token data IS available — the OpenAI API response includes a usage field with prompt_tokens, completion_tokens, and total_tokens. This data just isn't surfaced to the user.

Model info IS available — stored in session state, displayed in the banner at startup, but not persistently visible.


Implementation Plan

Skill vs. Tool Classification

This is a core codebase change to cli.py. It requires modifications to the prompt_toolkit layout and a data pipeline from API responses to the status bar display.

Specification

Status bar layout (1 line, above the input area):

 ⚕ claude-sonnet-4-20250514 │ Tokens: 12,450 / 200K │ [██████░░░░] 62% │ Cost: $0.23 │ 14m

Components:

  • Model name: current model (truncated if needed)
  • Token count: prompt + completion tokens / max context
  • Context bar: visual [██████░░░░] with color coding:
    • Green: < 50%
    • Yellow: 50-80%
    • Red: > 80%
    • Blinking red: > 95% (imminent overflow)
  • Cost estimate: cumulative session cost based on per-token pricing
  • Duration: session elapsed time

Data pipeline:

  1. After each API call, extract usage from response
  2. Update a shared state dict (_status_state)
  3. Trigger app.invalidate() to repaint the status bar
  4. Cost calculation: prompt_tokens * input_price + completion_tokens * output_price

Model pricing (configurable in config.yaml):

pricing:
  claude-sonnet-4-20250514:
    input: 3.0    # per million tokens
    output: 15.0
  gpt-4o:
    input: 2.5
    output: 10.0
  # fallback for unknown models
  default:
    input: 1.0
    output: 3.0

/usage command: Detailed breakdown:

Session Usage Report
━━━━━━━━━━━━━━━━━━━
Model:       claude-sonnet-4-20250514
Duration:    14m 32s
Turns:       7

Tokens:
  Prompt:     10,230 (input)
  Completion:  2,220 (output)
  Total:      12,450
  Context:    62% of 200,000

Cost:
  Input:      $0.031
  Output:     $0.033
  Total:      $0.064

Deliverables

  • Status bar Window in prompt_toolkit layout (above input_rule_top)
  • FormattedTextControl with dynamic content from _status_state
  • Token extraction from API response usage field
  • Context window max lookup per model (from model_metadata.py)
  • Color-coded context fullness bar
  • Cost estimation with configurable per-model pricing
  • Session timer (start time → elapsed)
  • /usage command with detailed breakdown
  • Compact mode handling (abbreviate or hide status bar in narrow terminals)
  • Tests for token tracking, cost calculation, display formatting

Pros & Cons

Pros

  • Highest impact-to-effort ratio of any CLI improvement
  • Prevents surprise context overflow (users see it coming)
  • Cost visibility saves money (users learn which prompts are expensive)
  • Small, self-contained change (one new Window in the HSplit)
  • No refactoring required — purely additive
  • Can ship in 1-2 days

Cons / Risks

  • Token counts from server are delayed (reported after completion, not during streaming)
  • Cost estimation requires maintaining a pricing table (could go stale)
  • Status bar takes 1 line of vertical space (mitigated by compact mode)
  • Some terminals may render the color bar poorly (test mosh, tmux, SSH)

Open Questions

  • Should the status bar be above the input (like vim's statusline) or between output and input?
  • Should cost tracking be opt-in (some users might find it anxiety-inducing)?
  • Should we use tiktoken for real-time token estimation DURING streaming, or only report after completion?
  • Should token counts include tool call tokens (they can be substantial)?

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions