Skip to content

Releases: aden-hive/hive

v0.10.5

25 Apr 03:23

Choose a tag to compare

🐝 Hive Agent v0.10.5: Cache-Aware Cost + New Frontier Models

A patch release with two big practical wins: real prompt-cache hits across OpenRouter routes (and the cost numbers to prove it), plus first-class entries for GPT-5.5, DeepSeek V4 Pro/Flash, and GLM-5.1.


✨ Highlights

💸 Huge cost cut from prompt caching

v0.10.4 made the system prompt static so providers could cache it. v0.10.5 actually collects on that work.

  • cache_control now propagates through OpenRouter for the sub-providers whose upstream APIs honor it: openrouter/anthropic/*, openrouter/google/gemini-*, openrouter/z-ai/glm*, and openrouter/minimax/*. Direct Anthropic / Bedrock / Vertex routes already worked; OpenRouter routes were silently no-op'ing the cache marker before.
  • Cache-token accounting is unified across providers. A single _extract_cache_tokens helper now reads OpenAI-shape prompt_tokens_details.cached_tokens, Anthropic-raw cache_read_input_tokens, and OpenRouter's normalized cache_write_tokens / cache_creation_input_tokens — and surfaces both cache-read and cache-creation counts (subsets of the input total, never double-counted).
  • Streaming cache tokens no longer get dropped. LiteLLM's calculate_total_usage aggregates token totals but discards prompt_tokens_details; the stream path now reaches back into the most recent chunk to recover cached/cache-creation counts so the FinishEvent is accurate.
  • Cost is reported in USD, not just tokens. Every LLMResponse and FinishEvent now carries cost_usd. The extractor consults four sources in priority order: native usage.cost → LiteLLM _hidden_params.response_costlitellm.completion_cost → curated catalog pricing — so models LiteLLM doesn't price (GLM, Kimi, MiniMax, DeepSeek V4) still get accurate numbers via the catalog fallback.
  • Persistent cost tracking — the cost number now flows through the event bus to the chat panel and queen DM, and is persisted across sessions instead of resetting on reload.

The combined effect: on a long Claude Sonnet / Opus session routed through OpenRouter, the static system prefix is now a cache hit on every turn after the first, and the panel shows you the dollar savings turn-by-turn.

🧠 New frontier models

  • GPT-5.5 is now the OpenAI default — frontier coding + reasoning, 128k output / 1.05M context, vision-capable.
  • DeepSeek V4 Pro and DeepSeek V4 Flash replace deepseek-chat. Both ship with 1M context, 384k max output, and full cache-read pricing (Pro: $1.74 / $3.48 / $0.145 per Mtok; Flash: $0.14 / $0.28 / $0.028). deepseek-reasoner is marked legacy.
  • GLM-5.1 replaces GLM-5 with cache-read pricing wired in.
  • Catalog pricing schema — every model can now declare pricing_usd_per_mtok with optional cache_read and cache_creation rates; validated on load.
  • supports_vision flag added to every model in the catalog and consulted by the new vision-fallback path so non-vision models can still receive image inputs via captioning.

🆕 What's New

Cost & Cache

  • cache_control for OpenRouter sub-providers — Anthropic, Gemini, GLM, MiniMax routes now mark the static system prefix as ephemeral cache. (@RichardTang-Aden)
  • _extract_cache_tokens helper — single reader for OpenAI / Anthropic / OpenRouter cache-token shapes; returns (cache_read, cache_creation). (@RichardTang-Aden)
  • Catalog pricing fallback_cost_from_catalog_pricing and _cost_from_tokens compute USD from pricing_usd_per_mtok when LiteLLM's catalog has no entry. (@RichardTang-Aden)
  • Streaming usage recovery — pull cache-token details from the last usage-bearing chunk after calculate_total_usage strips them. (@RichardTang-Aden)
  • cost_usd, cached_tokens, cache_creation_tokens added to LLMResponse, FinishEvent, and the stream-event bus. (@RichardTang-Aden)
  • Persistent cost tracking — costs survive session reload and surface in ChatPanel and queen-dm. (@RichardTang-Aden)

Models & Catalog

  • GPT-5.5 as the new OpenAI default with 1.05M context + native pricing. (@RichardTang-Aden)
  • DeepSeek V4 Pro / Flash with 1M context, 384k output, and cache-read pricing. (@RichardTang-Aden)
  • GLM-5.1 replaces GLM-5; cache-read pricing wired. (@RichardTang-Aden)
  • pricing_usd_per_mtok schema — validated input / output / cache_read / cache_creation per model. (@RichardTang-Aden)
  • supports_vision flag populated for every catalog entry; queried by the new vision-fallback path. (@RichardTang-Aden)
  • get_model_pricing / model_supports_vision helpers exposed from framework.llm.model_catalog. (@RichardTang-Aden)

Vision & Agent Loop

  • Image vision fallbackframework.agent_loop.internals.vision_fallback captions images for non-vision models so the same conversation works regardless of provider capability. (@TimothyZhang7)
  • Hybrid compaction buffer — context compaction now combines a fixed token reserve with a ratio-of-context buffer instead of one or the other. (@RichardTang-Aden)

Frontend

  • Configuration UI redesign — refreshed sidebar, prompt library, skills library, and tools editor. (@vincentjiang777)
  • Cost + token usage in chatChatPanel and queen-dm show running token consumption and USD cost per session. (@RichardTang-Aden)

Tests

  • test_litellm_provider.py (+448 lines) covering cache-token extraction, cost-extraction priority order, OpenRouter compat-mode cache wiring, and streaming usage recovery.
  • test_model_catalog.py extended for the new pricing schema and supports_vision flag.
  • test_event_bus.py / test_stream_events.py extended for the new cost + cache fields.

🐛 Bug Fixes

  • Vision caption — fix incorrect caption attachment in the vision-fallback path. (@TimothyZhang7)
  • Colony-fork test flake — drain background fork tasks before asserting on colony-spawn artifacts. (@RichardTang-Aden)

🚀 Upgrading from v0.10.4

No migration. Pull main at v0.10.5 and restart Hive — existing ~/.hive/ profiles, queens, colonies, and sessions keep working.

Two things to know:

  1. Default DeepSeek model changed from deepseek-chat to deepseek-v4-pro. If a queen is pinned to deepseek-chat, that id is gone from the catalog — pick deepseek-v4-pro or deepseek-v4-flash.
  2. Default OpenAI model changed from gpt-5.4 to gpt-5.5. gpt-5.4 stays in the catalog as the previous-flagship option.

Cache the prompts. 🐝

v0.10.4

23 Apr 04:45

Choose a tag to compare

🐝 Hive Agent v0.10.4: Skill & Tool Library

Skills and tools move from something the framework hands down into something you curate. Every queen and every colony now has a dedicated allowlist and a UI to manage it, and the system prompt gets smaller and cache-friendlier along the way.

image

While we’ve been seeing the queen take on more capable tasks, we also want to give you better visibility into how the queen and the colony achieve it.

Skill Library + Tool Library are here. Every queen and every colony now gets its own tool allowlist and skill set. Browse them, toggle them, upload your own, and author new ones right in the UI. Colonies inherit from their founding queen and then evolve on their own (starting from the skill created by the queen)

Also: the system prompt is now fully static across a session (meaning caching will be used and save you tokens 💰). Date/time has been moved to turn-time injection, so the prompt prefix stops changing and prompt caching actually works. Small change, big win.


🆕 What's New

Skill Library

  • Skill Library page — browse every skill by scope (queen / colony / framework preset), view SKILL.md inline, toggle per-scope enablement, upload skills as .md or .zip, and author new skills from the UI.
  • Per-scope overrides — skill enablement is recorded in ~/.hive/agents/queens/{queen_id}/skills_overrides.json and ~/.hive/colonies/{colony_name}/skills_overrides.json; framework presets stay read-only, user-authored skills live under each scope's own skills directory.
  • Skill provenance — the API and UI now distinguish framework-preset skills, queen-authored skills, and colony-authored skills, so you can tell at a glance who owns a given skill.
  • Skill authoring primitives — a shared framework.skills.authoring module validates names, parses frontmatter, and materializes skill folders for the UI upload path, the create_colony tool's inline skills, and future runtime-learned skills.
  • Preset rename — built-in skills moved from _default_skills/ to _preset_skills/ to match the new "preset vs. user" split. Existing browser/linkedin/x automation skills carry over untouched.

Tool Library

  • Tool Library page with a shared ToolsEditor component used by the queen profile and colony settings panels.
  • Per-queen tool allowlist at ~/.hive/agents/queens/{queen_id}/tools.json: null = allow all, [] = disable all, ["foo", "bar"] = only these MCP tools pass the filter.
  • Per-colony tool allowlist at ~/.hive/colonies/{colony_name}/tools.json, with the same schema, atomic writes, and independent lifecycle.
  • Configurable defaults — queens now carry a default tool/skill bundle that seeds each new colony, and the bundle itself is editable.
  • Colony inheritance — when a queen spawns a colony, the colony starts from the queen's tool and skill configuration. After spawn the two diverge freely.
  • Colony sidecartools.json lives next to metadata.json so identity/provenance (queen, created_at, workers) and tool gating evolve independently.

MCP Server Management

  • MCP Servers panel — dedicated settings UI for browsing, configuring, and enabling bundled and user MCP servers.
  • /api/mcp routes for listing built-in servers, inspecting state, and reporting errors with structured MCP error responses.
  • Tool catalog wiring — live queen sessions now surface their MCP tool catalog to the queen-tools and colony-tools endpoints, so the UI shows exactly what the running session can see.

Prompt & Runtime

  • Static system prompt — the agent loop, conversation, and provider adapters (LiteLLM, Antigravity, Codex, Mock) now build and freeze the system prompt once per session. Per-turn values that used to churn the prompt are gone.
  • Date/time injected at turn time — today's date and current time move from the system prompt into a turn-level injection path that updates cursor persistence and queen-lifecycle tooling.
  • Queen orchestrator — refreshed to pair with the static prompt model and the new tool/skill configuration layers.
  • Session manager — tightened session-creation input validation and reflection/skill edge handling; "create new session and switch branch" is now reliable.

🐛 Bug Fixes

  • No-cache middleware on /api/* — every API response now carries Cache-Control: no-store. Without this, a one-off bad response (e.g. the SPA catch-all leaking index.html for an /api/* URL before a route was registered) could get pinned in the browser's disk cache and replayed forever, since our JSON handlers don't emit ETag/Last-Modified. Hard-refresh no longer required to recover.
  • Tools & skills registration — queens and colonies no longer end up with stale or duplicated entries after reloads.
  • Session creation — invalid inputs are rejected up front with clear errors instead of surfacing later as runtime failures.
  • Skill / reflection edges — tightened handling so reflection runs no longer see half-built skill state during scope reloads.
  • Create new session + switch branch flow works end-to-end without orphaning sessions.
  • CI — broken workflow repaired.

🧪 Tests

  • test_routes_skills.py, test_skill_overrides.py, test_colony_tools.py, test_queen_tools.py, test_mcp_routes.py — coverage added for every new route group and the override store.

🚀 Upgrading from v0.10.3

No migration. Pull main at v0.10.4 and restart Hive — existing ~/.hive/ profiles, queens, colonies, and sessions keep working.

Two things to know:

  1. Preset skills directory renamed from _default_skills/ to _preset_skills/ inside the framework. If you had external scripts pointing at that path, update them. User-authored skills under ~/.hive/ are unaffected.
  2. First open of a queen or colony writes a tools.json sidecar the first time you edit its allowlist. If you don't touch the Tool Library, nothing is written and behavior matches v0.10.3 (allow all MCP tools).

Curate your queens. 🐝

v0.10.3

21 Apr 02:49

Choose a tag to compare

🐝 Hive Agent v0.10.3

Colonies grow up, and Queen DMs learn to listen.

v0.10.0 introduced colonies. v0.10.3 is the release where they stop feeling like a new concept bolted on and start feeling like the place you actually work. Alongside that, Queen DMs got the single biggest fix to single-agent chat since we shipped it: you can keep typing while the queen is thinking, and she'll hear you.


The Colony, grown up

When you spawn a colony now, a few things happen that didn't before.

The queen who spawned it hands off cleanly — her session is compacted first, so the new colony doesn't inherit a bloated context and spend its first ten turns figuring out what it already knows. There's a short incubating phase between "spawn requested" and "colony live" where skills, storage, and scheduler tools get set up quietly in the background. By the time the colony is ready, it has its own scoped skill bundle and SQLite — no more cross-colony skill leakage, no more workers belonging to the wrong group.

The UI finally matches the model. The sidebar groups everything by colony with a DataGrid view, shows the active queen on a dedicated bar inside the colony, and lets you click a worker to open it as its own tab. Tables and workers are scoped to the colony you're looking at, which sounds obvious in hindsight and was a long-standing source of confusion. Queen identity — name, title, avatar — now travels with the queen into message bubbles, the profile pane, and the org chart, so it's consistent no matter where you see her.

If you were using colonies in v0.10.1 or v0.10.2, this release is the one where the experience stops fighting you.

Queen DMs stop eating your keystrokes

The most common complaint about Queen DMs was simple: if the queen was mid-turn and you thought of something to add, your message either got lost or arrived at a weird moment. That's gone.

Messages you send while the queen is working now land in a pending queue, visible in the chat panel with a Steer or Cancel control. Steer folds your message into the turn in progress; Cancel drops it. When the queue auto-flushes, the "typing…" indicator no longer flickers, and the old bootstrap race that sometimes rendered your own message twice is fixed.

The queen also got a proper ask_user tool this release, so when she genuinely needs something from you, it shows up as a question — not as a regular chat message you have to parse as one. Tool calls in chat are grouped by session now, so a chatty worker doesn't drown out the queen's own thinking, and her avatar is on every bubble so you can tell who's talking at a glance.

Smaller things worth knowing

  • Prometheus tool for querying metrics from agents (#7047).
  • Scheduler + triggers got a UI pass, better reliability on trigger message delivery, and scheduler tools are now available during the incubating phase.
  • VSCode extension bumped to 1.0.1 with refreshed icons and a fix for frame-resize jank.
  • Model catalog updates for Xiaomi and OpenRouter selections.
  • Runtime reliability: cancelled executions now fully terminate before a session can restart (#7001), Codex store=False is honored correctly (#7089), and the UI handles a broken Aden API key gracefully instead of hanging.

Upgrading from v0.10.2

No migration. Pull main at v0.10.3 and restart Hive — your existing ~/.hive/ profiles, queens, colonies, and sessions keep working.

One thing to be aware of: worker and table tabs are now scoped per colony. If you expected them to be global, switch colonies in the sidebar to see each colony's own.

v0.10.2

17 Apr 06:44

Choose a tag to compare

🐝 Hive Agent v0.10.2

A browser-automation-focused follow-up to v0.10.1. Coordinates that flow between the vision model and Chrome are now fractions of the viewport instead of screenshot pixels — so the same (x, y) works across Claude, GPT-4o, Gemini, and any other VLM regardless of how each one resizes or tiles the image. Plus reliability fixes for queen switching, tab-group isolation, and CI.


✨ Highlights

  • Model-invariant visual clicks. Every coordinate-taking browser tool (browser_click_coordinate, browser_hover_coordinate, browser_press_at) and every rect-returning tool (browser_get_rect, browser_shadow_query, the rect inside focused_element) now speaks in 0..1 fractions of the viewport. Vision-model pixel resizing no longer silently breaks clicks when you swap backends.
  • Queens survive profile/queen switches. Switching queens no longer tears down the active queen's runtime.
  • Tab-group isolation. Browser tab groups are now namespaced per profile, so stale highlight / attach state can't bleed across profiles when Chrome reuses a tab id.
  • Remote browser debugger. New scripts/browser_remote.py + HTML UI give a visual debugging surface for the Chrome extension bridge — live screenshots, coord inspector, and one-click test harness for the GCU tools.
  • Greener CI. All framework/tools test failures resolved and Windows CI is unbroken; full ruff lint + format pass across the codebase.
  • Gemini reliability tuning. gemini-3-flash-customtools and gemini-3.1-pro-preview-customtools now run with max_context_tokens: 240000 (down from 900k) — long-context quality on Gemini degrades well before the advertised window, and clamping lower trades headroom for more predictable tool use.

🆕 What's New

Browser automation

  • Fraction-based coordinates — all click / hover / press / rect tools now use (0..1, 0..1) fractions of the viewport. Internally each tool multiplies by the cached cssWidth / cssHeight before dispatching to CDP. Four-decimal precision (0.0001 ≈ 0.17 CSS px on a 1717-wide viewport) is sufficient for the tightest targets. (@timothyadenhq)
  • browser_type_focused — dedicated focused-element typing tool split out from browser_type. Use after browser_click_coordinate focuses the target; faster than browser_press for multi-character input. (@RichardTang-Aden)
  • Multi-mode screenshot toolbrowser_screenshot gained viewport / full-page / selector-clip modes and returns cssWidth / cssHeight in metadata so callers can reason about viewport size if they need to. (@RichardTang-Aden)
  • Dashed highlighter for type-focus events — visual differentiation between click (solid) and type-focus (dashed) highlights on post-interaction screenshots. (@RichardTang-Aden)
  • Default 1 ms key delay + prompt tuningbrowser_type now uses a 1 ms delay by default (was higher), matching what real rich-text editors expect; related orchestrator prompt improvements. (@RichardTang-Aden)
  • Remote browser debugger UIscripts/browser_remote.py + scripts/browser_remote_ui.html provide a live visual surface to exercise the GCU browser bridge (screenshots, click targeting, coord readout). (@RichardTang-Aden)
  • Iframe-aware focused_element — same-origin iframe descent (capped at 5 levels), so focused_element reports the real inner element instead of {tag: "iframe"}. Adds an inFrame: [...] breadcrumb when traversed. (@timothyadenhq)

Skills & prompts

  • Browser / LinkedIn automation skills rewritten around the new fraction convention — "read proportion off the image" workflow, updated rect examples, updated troubleshooting entries. (@timothyadenhq, @RichardTang-Aden)
  • GCP skills and prompt improvements — polish on the browser-edge-cases skill and the queen GCU reference guide. (@RichardTang-Aden)
  • Canonical workflow simplified — slimmer, less prescriptive guidance in the default browser/linkedin skills. (@timothyadenhq)

Core / server

  • Namespaced browser tab groups — per-profile tab_group tracking in queen_orchestrator / session_manager, with a clear_tab_highlights(tab_ids) cleanup hook called on context destruction so stale highlight / attach state can't leak onto reused tab ids. (@timothyadenhq)
  • Don't kill the queen on switch — queen switching no longer invokes the "stop runtime" path, keeping active sessions alive across UI navigation. (@timothyadenhq)

LLM & model catalog

  • Gemini context window clamped to 240kcore/framework/llm/model_catalog.json drops max_context_tokens from 900000 → 240000 on both gemini-3-flash-customtools (Fast) and gemini-3.1-pro-preview-customtools (Best quality). Reduces the chance of context-window-edge failures on long sessions. (@RichardTang-Aden)

Developer experience

  • Codebase-wide ruff clean — 155 lint errors (70 auto-fixed + 85 manual) resolved across framework and tools; 343 files reformatted. Long-line, missing-import, duplicate-method, and W291 whitespace issues all cleared. (#7058)
  • Framework + tools test suite green — 52 → 0 failures across framework tests (mock LLM model attribute, updated skill/prompt assertions, compaction formatting, model catalog) and tools tests (csv_tool paths, browser_evaluate toast wrapper). (#7059)
  • Windows CI unbroken — background-job test uses sys.executable + double quotes, CLI entry-point guards against None stdout, safe-eval timeout bumped for slower Windows runners. (#7061)

🐛 Bug Fixes

  • Fraction-click tab-state leak (_screenshot_css_scales NameError)clear_tab_state raised NameError on every tab close and profile teardown because a removed cache was still referenced. Fixed in tools/src/gcu/browser/tools/inspection.py.
  • Missing highlight cleanup on profile destroy — introduced clear_tab_highlights so orphaned highlight state doesn't reappear when Chrome reuses a tab id on a later profile.
  • Queen session shutdown on switch — switching between queens no longer terminates the active queen's runtime.
  • Pruned tool-result sentinel mismatch — compaction / conversation now accept both Pruned tool result ... and [Pruned tool result ...] sentinel shapes.
  • Mock LLM infinite loop on exhausted scenariosMockStreamingLLM and _ByTaskMockLLM now emit a clean text-stop when scenarios are consumed, unblocking test_worker_report.

⬆️ Upgrading from v0.10.1

No migration steps for stored state — existing ~/.hive/ profiles, queens, and sessions continue to work.

Behavior change for direct callers of browser coord tools: browser_click_coordinate, browser_hover_coordinate, browser_press_at, and rect-returning tools now expect and return fractions of the viewport (0..1 on each axis) instead of screenshot pixels. Agents using the default browser-automation skill get this automatically — the skill was updated alongside the tool change. Only custom code that hardcoded pixel coordinates against the prior 800 px-wide screenshot space needs adjustment: divide by cssWidth / cssHeight (now exposed in browser_screenshot metadata) to convert.

Pull main at the v0.10.2 tag and restart Hive.

v0.10.1

16 Apr 01:23

Choose a tag to compare

🐝 Hive Agent v0.10.1

A small follow-up to v0.10.0 The Colony — polish on the queen experience, a more reliable agent loop under long contexts, and sharper browser automation skills. No breaking changes; v0.10.0 sessions continue to work.


✨ Highlights

  • Queen DMs feel alive. Chat now shows message timestamps and day-divider rows, with a stable createdAt across stream updates so messages don't jitter as they arrive.
  • Queen profile, one click away. The queen profile is now its own panel, opened directly from the app header — available on every page, not just the org chart.
  • Smarter sidebar. Queens are sorted by the most recent DM activity, so whoever you're actually working with floats to the top. The "Head of" prefix is trimmed for a cleaner look.
  • Calmer, leaner queen prompt. The independent / PM-mode prompt has been significantly slimmed down and reworked for better reasoning.
  • Context health you can trust. A set of fixes to the agent loop's context tracking, compaction, and tool-result handling — long sessions stay healthy instead of drifting into eviction loops.
  • Browser automation, upgraded. Browser, LinkedIn, and X automation skills gained new guidance, and the underlying CDP bridge is more robust across click, snapshot, and inspection paths.

🆕 What's New

Queens & Chat UX

  • Message timestamps and day dividers in DMsChatPanel now shows per-message time, groups by day, and preserves a stable createdAt across streaming updates so messages don't reshuffle. (@bryanadenhq)
  • QueenProfilePanel extracted from org-chart — the profile panel is now a standalone component opened from AppHeader, available globally through the app layout. (@bryanadenhq)
  • Sort queens by last DM activityColonyContext orders queens by most recent interaction, and SidebarQueenItem trims the "Head of" title prefix. (@bryanadenhq)
  • last_active_at derived from latest messagesession_manager now derives queen activity from the actual message stream and sorts history newest-first, keeping the sidebar in sync with reality. (@bryanadenhq)
  • Queen independent-prompt refactorqueen/nodes/__init__.py shrinks from ~215 lines to ~95, with cleaner prompt construction for independent / PM mode and an updated debug_queen_prompt.py script. (@RichardTang-Aden)
  • Finance queen title polish — Charlotte's title updated in queen_profiles.py. (@RichardTang-Aden)

Agent Loop & Context Health

  • Context health and eviction fixes — substantial rework across agent_loop.py, conversation.py, compaction.py, tool_result_handler.py, and internals/types.py to keep long sessions stable. Compaction, tool-result accounting, and eviction decisions are now driven by a more accurate view of conversation state. (@timothyadenhq)

Skills & Tools

  • Browser automation skill guidancebrowser-automation/SKILL.md updated with sharper instructions for agents working inside Chrome. (@RichardTang-Aden, @timothyadenhq)
  • New LinkedIn and X automation skills — dedicated linkedin-automation and x-automation SKILL files with site-specific playbooks. (@timothyadenhq)
  • GCU browser bridge hardeningtools/src/gcu/browser/bridge.py and the advanced, inspection, and interactions tool modules gained reliability fixes around CDP calls and snapshot flow. (@timothyadenhq)

LLM & Model Catalog

  • Gemini customtools modeltool_result_handler and model_catalog.json updated so Gemini routes through the customtools model variant; covered by test_model_catalog.py. (@RichardTang-Aden)

🐛 Bug Fixes

  • Context eviction loops — long conversations no longer drift into repeated compaction/eviction due to stale context accounting.
  • Message reshuffling in DMs — stable createdAt prevents messages from jumping as their streams complete.
  • Queen sidebar staleness — activity-based sort keeps the most recently used queens at the top instead of a static order.
  • Queen profile access — profile is reachable from anywhere via the header, not gated behind the org chart.

⬆️ Upgrading from v0.10.0

No migration steps required. Pull main at the v0.10.1 tag and restart Hive — existing ~/.hive/ state, queen profiles, and sessions continue to work.

v0.10.0 — The Colony

15 Apr 03:01

Choose a tag to compare

🐝 Hive Agent v0.10.0: The Colony

⚠️ Breaking change. This is a large architectural refactor of how agents work in Hive. Old agents are no longer compatible. Existing workspaces, custom agents, and saved sessions from pre-v0.10.0 builds will need to be recreated.


✨ Highlights

The Colony introduces a new way of working: a group of specialized workers operating together to run and scale your business.

The role of the Queen has evolved. Instead of only orchestrating, the Queen now executes work first to deliver immediate value, then builds systems around that work to create stable, repeatable business processes.

You now have a full leadership team of eight Queens, each with their own identity, expertise, and voice:

Queen Role
Sophia Head of Brand & Design
Charlotte Head of Finance & Fundraising
Victoria Head of Growth
Eleanor Head of Legal
Rachel Head of Operations
Isabella Head of Product Strategy
Amelia Head of Talent
Alexandra Head of Technology

Start automating your business processes with your Queens today.


🏛️ The Colony Architecture

Queens as Identities, Not Just Orchestrators

  • Queen profiles — each queen is a YAML-backed persona (~/.hive/agents/queens/{queen_id}/profile.yaml) with core traits, hidden background, psychological profile, behavior triggers, and skill sets. Profiles are injected into the system prompt at session start.
  • CEO-style queen selection — an LLM classifier routes every new user request to the best-matching queen based on the task at hand, with structured routing diagnostics (QueenSelection).
  • Queen DMs — direct-message pages for each queen with a dedicated session flow, session switcher, and prompt library integration.
  • Independent / PM mode — queens run in an independent mode for planning-phase work, with a "think out loud" internal monologue surfaced through internal tags.
  • Queen memory v2 — simplified memory implementation with reflection agent, cooldown-gated reflections, user identity, doppelganger wiring, and recall-selector for targeted retrieval.
  • Queen lifecycle tools — first-class tools for escalation, queen reply, and session handoff.

Colony Runtime

  • Grand architecture revamp — the framework, agent loop, runtime, graph, pipeline, executor, and node worker layers have been rewritten from the ground up. Deprecated shims and legacy orchestration paths have been removed.
  • Colony creation flow — colonies are created via skill, with reliable event bus subscription, worker spawning, and post-creation list refresh.
  • Scheduled triggers — colonies can now be woken on a cron schedule, with triggers firing directly into the owning queen's session.
  • Simple fork for agents, stable credential states, and improved worker execution reliability.

🆕 What's New

Colony & Queens

  • 8 default queen personas (Alexandra, Victoria, Isabella, Charlotte, Eleanor, Sophia, Amelia, Rachel) with profile YAML, examples, and behavior triggers
  • LLM-based queen selector with reasoning output
  • Queen DM page, queen session switcher, and sidebar queen item
  • Queen scope memory, role examples, and identity loading
  • Reflection agent with cooldown and improved reflection runner
  • Queen orchestrator + routes_queens API
  • Natural chat replies and cleaner home-prompt bootstrap
  • Queen identity for new sessions
  • ask_user / ask_user_multiple tools available in queen prompt
  • Escalation and queen-reply tools

Skills & Tools

  • Learned default skills — skills the queen has learned become part of her baseline
  • Tool-gated skill activation — skills only activate when their required tools are present
  • Skills for colonies — per-colony skill registration and loading
  • Text-only model filter — image-producing tools and vision-only prompt blocks are hidden from text-only models
  • Browser skills upgrade — improved click reliability, screenshot capture, and credential filtering
  • Deprecated-tool removal and alignment of Hive tool names across the codebase
  • Ask-user widget with fallback rendering and preserved tool pill mapping across turn boundaries for deferred completions
  • Improved tool-call reliability across the board (tool limit removed, tool blacklist, tool credential filter)
  • MCP — efficient MCP loading at initialization, default MCP bootstrapping, registered available MCP tools, fixed MCP tool initialization and registry pipeline stage

LLM & Credentials

  • Key pool for credential management with stable credential states
  • Aden credentials storage adapter and subscription-based LLM config activation endpoint
  • Consolidated model config with unified model catalog
  • New providers — Kimi, Hive, and Aden added to the model catalog
  • Model switcher UI with runtime model switching API
  • LLM key validation endpoint with agent errors surfaced via SSE
  • BYOK modal import fixes for subscription token detection

Frontend

  • Home redesign — new home, credentials, and org chart pages
  • Colony chat and queen DM pages
  • Sidebar + header components and global app layout/routing
  • Model switcher, settings modal, template card
  • Prompt library with search, category filtering, and UI polish
  • Side panel fixes and sub-agent pane light-mode support
  • Flowchart light-mode support and normalized settings modal sizing
  • User profile settings and UI enhancements
  • Sync user profile to global memory as user-profile.md; queen profile API transformation
  • Removed the old workspace GUI and its dependencies

Framework & Runtime

  • Architecture revamp: new runtime config, simplified agent loading, new infra for queen
  • Home hive directory structure refactor
  • Agent loading pipeline fixes, MCP registry pipeline stage fix
  • Session resume improvements: separate resume vs new-session flow for queen sessions, edge-case fix for message injection in resumed sessions
  • Strip internal tags from user-visible output
  • Colony event bus subscription fixes and shared event bus for parent visibility
  • Worker spawn and stop-worker fixes
  • Default log level and extra logging hooks

🐛 Bug Fixes

  • Ask-user widget — fallback when widget fails to mount
  • Skill loading for colonies and proper skill resolution across queen sessions
  • Model switching and new-chat flow no longer carry stale state
  • Tool pill mapping preserved across turn boundary for deferred ask_user completions
  • Tool limit removed (was capping legitimate long tool lists)
  • Queen loading stability fixes
  • Side panel rendering issues
  • Deprecated graphs removed from UI
  • Home-page prompts now reach the queen directly without waiting for the greeting to finish
  • Colony creation link, reframing, and post-creation refresh
  • Build error in colony creation path
  • GCU system prompt tuning
  • Tool credential filter correctness
  • Screenshot capture and browser click reliability
  • Queen message injection when resuming a session
  • Internal-tag diction fixes in surfaced output
  • MCP tool initialization on cold start
  • Frontend DM edge cases
  • Prompt library new-session handling for new chat
  • Config validation and unavailable Minimax model handling
  • Queen identity loading on cold boot
  • Extra text in queen selector JSON response parsed safely
  • Outdated queen communication prompt removed

🧹 Refactor & Cleanup

  • Shatter the Eld*n ring — top-to-bottom refactor of the runtime core
  • Grand clean-up of deprecated code paths
  • Remove deprecated shims and old session-status tools
  • Big test cleanup — integration tests and component tests rewritten around the new architecture
  • Update references for orchestrator / host / loader renames
  • Consolidate tests for queen state machine and verified outcomes
  • Remove old workspace GUI and its dependencies
  • Remove old "new agent" button and deprecated entry points
  • Home hive directory structure refactor

⚠️ Breaking Changes

  • Old agents are not compatible. Custom agents authored against the pre-v0.10.0 framework will need to be re-authored against the new Queen/Colony runtime.
  • Session format — pre-v0.10.0 sessions cannot be resumed.
  • Deprecated tools removed and Hive tool names have been realigned; any external scripts referencing old tool names must be updated.
  • Old session-status tools removed in favor of the new queen lifecycle tools.
  • Workspace GUI removed — the legacy workspace UI is gone; use the new home, colony chat, and queen DM pages.
  • MCP registry pipeline — MCP configurations now load through the new registry; custom MCP setups may need to be re-registered.

🚀 Upgrading

Because this release rewrites the agent runtime, the recommended upgrade path is:

  1. Back up ~/.hive/ if you have sessions or custom agents you want to reference.
  2. Pull main at the v0.10.0 tag.
  3. Let Hive initialize the new queen profiles under ~/.hive/agents/queens/.
  4. Re-create any custom agents as colonies/queens against the new framework.
  5. Re-register any custom MCP servers through the new MCP registry.

Welcome to the Colony. 🐝

v0.9.0 — Browser Extension, Queen Memory v2 & Graph Executor Refactor

04 Apr 04:18
e8d56c8

Choose a tag to compare

Chrome extension replaces Playwright, episodic queen memory with intelligent recall, rewritten graph executor with parallel fan-out, and a real-LLM integration test suite. 232 files changed, +33.2k / −12.9k lines across 116 commits.

Highlights: Browser Extension & Bridge Architecture

The Playwright-based browser stack is gone. Agents now control the user's existing Chrome browser through a WebSocket bridge and an unpacked MV3 Chrome extension. No more spawning separate Chrome processes per agent — they inherit the user's real login state, cookies, and extensions out of the box.

The Chrome extension (tools/browser-extension/) runs a background service worker that dispatches commands (context.create, tab.create, cdp.attach, etc.) through an offscreen document for persistent WebSocket connectivity. Each agent gets its own color-coded chrome.tabGroup visible directly in the browser UI.

On the Python side, BeelineBridge (bridge.py) hosts a WebSocket server on port 9229 and an HTTP status endpoint on 9230. It handles CDP passthrough for Page/DOM/Input/Runtime domains, manages tab groups, supports configurable navigation wait conditions, and renders visual highlights (blue rectangles, red crosshairs) via CDP Overlay.

All browser tools (click, type, navigate, screenshot, etc.) now talk to the bridge instead of Playwright. The old session.py went from 996 lines down to 67 — just a stub for the active profile context variable.

Setup

1. Open chrome://extensions/
2. Enable "Developer mode"
3. Click "Load unpacked" → select tools/browser-extension/

Highlights: Queen Memory v2

The monolithic cross-session memory model has been replaced with a fine-grained episodic system. Individual .md files live in ~/.hive/queen/memories/, each with YAML frontmatter and one of five types: goal, environment, technique, reference, diary. Hard limits of 4 KB per file and 200 files keep the store lean.

Recall Selector

Before each turn, a lightweight selector scans memory headers in a single LLM call and picks ~5 most-relevant memories to inject. It only sees the current user query (not full history) to keep cost low, prepends staleness warnings for entries older than 1 day, and gracefully degrades to empty selection on error.

Reflection Agent

Runs asynchronously after each queen turn with cursor-based incremental processing. Short reflections happen every turn (batch read then write). Long reflections trigger every 5 short reflections, on CONTEXT_COMPACTED, and at session end — performing holistic reorganization, deduplication, and consolidation. Daily diary narratives are written to MEMORY-YYYY-MM-DD.md after each cycle.

The v1 memory seed template also gained explicit communication tracking — technical depth, pace, tone preferences — evolving from surface-level (day 1) to nuanced (day 5+).


Highlights: Thinking Tag Rendering

Gemini-family models emit structured thinking tags (<situation>, <monologue>, <execution_plan>) that were previously hidden or dumped as raw text. The frontend now parses these into collapsible UI blocks — thinking-tags.ts extracts and merges adjacent segments, and ThinkingBlock.tsx renders them as muted, togglable boxes with markdown inside. All parsing lives on the frontend, keeping the backend model-agnostic.


Features

Graph Executor Refactor

The executor moved from imperative while-loop control flow to event-driven execution. Multi-edge exits are detected automatically and targets run concurrently with per-branch timeouts and conflict strategies (last_wins / first_wins / error). Completed nodes can reset and re-activate on incoming edges for feedback loops, guarded by max_node_visits. At node transitions, phase boundary compaction recursively summarizes conversation history using binary-search splitting for context-window overflow. New execution quality metrics classify outcomes as clean, degraded, or failed.

Three new supporting modules: context.py centralizes graph-run shared state with scoped buffer permissions; prompting.py is a pure prompt rendering layer with NodePromptSpec / TransitionSpec dataclasses; gcu.py defines a declarative browser-automation node type with auto-included tools and best-practices prompting.

Queen Lifecycle: 5-Phase Model

New editing phase (planning → building → staging → running → editing) lets the queen tweak configurations and re-run without full rebuilding. Backward transitions to planning/building are blocked with warnings. QueenPhaseState now stores persona prefix, style directives, and cached recall blocks at session level, persisting across phase transitions.

Compaction Improvements

A new zero-LLM-cost microcompaction step clears old tool results by count (keeps the 8 most recent), saving ~50% of compaction time. A circuit breaker stops auto-compacting after 3 consecutive failures. Image content is replaced with [image] markers before LLM summarization. The compaction prompt now uses an 8-section structured format for better coherence.

Gemini & Kimi LLM Support

Gemini tool-call fallback parses hallucinated tool invocations from <tool_code> text blocks and synthesizes proper ToolCallEvents. Non-standard finish reasons (e.g., Kimi's pause_turn) no longer crash — accumulated content is yielded instead.

Frontend

BrowserStatusBadge polls /api/browser/status every 3s showing green/yellow/grey connection state. The workspace was simplified by removing worker-specific input handling and using graph_name instead of worker_name.

Runtime

Idempotency key support for trigger() and trigger_and_wait() — duplicate keys within TTL return cached execution_id (#6710). Contextvars now propagate to tool executor threads (#6854). New event bus events WORKER_COMPLETED and WORKER_FAILED replace legacy escalation events.

Cloudflare DNS/Zone Tool (#6658@saurabhiiitm062) — full DNS record and zone management with health check endpoint.

Freshdesk Helpdesk Integration (#6099) — helpdesk ticket management tool.


Testing

Real-LLM Integration Test Suite (core/tests/dummy_agents/)

A new E2E test framework hitting real LLM providers (Claude, GPT, Gemini, Kimi) — no mocks. Run via python run_all.py with interactive provider selection, smoke testing, 90s timeout, and artifact capture to /tmp/hive_test_artifacts/.

13 component test files cover individual system layers: LLM streaming and tool calling, event loop iteration limits, MCP tool discovery, queen phase transitions and state machine, outcome validation (strict and verified), graph execution paths, conversation threading, and edge evaluation.

Additional suites: browser tools (878 lines), Cloudflare tool (1112 lines), Freshdesk tool (1781 lines), memory function unit tests. Tool registry coverage went from 47% to 69% (#6818).


Documentation

Five new LLM providers documented (#6865). Browser use patterns guide (BROWSER_USE_PATTERNS.md) and browser extension README added. Outdated docs for worker health monitoring, runtime logging, and resumable sessions removed.


Thank You to Our Contributors

This release wouldn't be possible without the community. A huge thank you to everyone who contributed since v0.8.0:

Saurabh Kumar (@saurabhiiitm062) — built the entire Cloudflare DNS/Zone tool integration from scratch, including DDoS protection, pagination, validation, and a comprehensive test suite. A massive contribution.

Hundao (@Hundao) — fixed contextvars propagation to tool executor threads (#6854) and the Windows executable permission check in the skill validator (#6894).

Rohit Singh (@Rohit23SR) — added idempotency key support for trigger() in the runtime (#6710).

Harsh Gajjar (@harshhh28) — built the Freshdesk helpdesk integration tool (#6099).

Gaurav Rai (@raiigauravv) — increased tool_registry test coverage from 47% to 69% (#6818).

Bhuvaneswari N (@BHUVANAN8) — documented 5 new natively supported LLM providers (#6865).


Upgrading

git pull origin main
uv sync

# For the web frontend
cd core/frontend
npm install
npm run build

or, simply run:

./quickstart.sh   # (linux/mac)
.\quickstart.ps1  # (windows)

Previous release: v0.8.0 — Skills CLI, MCP Registry & Security Hardening

v0.8.0 — Skills CLI, MCP Registry & Security Hardening

01 Apr 02:23
cf32969

Choose a tag to compare

Skills CLI with install/remove/validate, MCP Registry with agent-level server selection, config override application for default skills, and a round of security fixes. 147 files changed, +16.8k / −15.2k lines across 120 commits.

Highlights: Agent Skills

v0.8.0 completes the agent skill system introduced in v0.7.4. Skills are SKILL.md packages that inject operational protocols (note-taking, batch tracking, error recovery, etc.) into an agent's system prompt at runtime.

Installing Skills

There is no public skill registry yet. To install a skill today, clone it from a git URL:

hive skill install --from https://github.com/someone/my-skill.git

# Pin a branch or tag
hive skill install --from https://github.com/someone/my-skill.git --version v2.0

# Override the local directory name
hive skill install --from https://github.com/someone/my-skill.git --name custom-name

Skills are installed to ~/.hive/skills/<name>/ and automatically discovered at agent startup.

Managing Skills

hive skill list                # list installed skills (project + user + framework)
hive skill info <name>         # show details, scripts, and references
hive skill validate <path>     # strict validation for CI / contributors
hive skill remove <name>       # uninstall

Writing a SKILL.md

A minimal skill looks like:

---
name: my-skill
description: What this skill does
license: MIT
compatibility:
  - hive
allowed-tools:
  - Bash(curl:*)
---

## Protocol

Your operational instructions here.

Place this at ~/.hive/skills/my-skill/SKILL.md and it will be picked up automatically.


Features

Skills CLI (#6782@levxn)

  • hive skill install — install from git URL, with --version and --name overrides
  • hive skill install --pack <name> — install a starter pack (bundle of skills)
  • hive skill remove, hive skill list, hive skill info, hive skill validate
  • hive skill doctor — diagnose common skill issues
  • Strict validation (check 1–12) covering frontmatter, body, scripts, allowed-tools, naming conventions

Skill Config Overrides & Runtime Heuristics (#6610@levxn)

  • {{placeholder}} substitution in SKILL.md bodies — overrides from default_skills agent config now take effect at runtime (previously parsed but silently discarded)
  • DS-12: hive.batch-ledger auto-detects batch scenarios from goal/input text and prepends a ledger-init nudge to the system prompt
  • DS-13: hive.context-preservation monitors token usage and injects a one-time preservation warning when usage crosses warn_at_usage_ratio (default 0.45), before the framework's 0.6 prune threshold

MCP Registry & Agent Selection (#6574@Antiarin, #6792@fermano)

  • MCPRegistry core module — install, list, health-check, and manage MCP servers from a central registry
  • mcp_registry.json per agent directory — declarative server selection with include, tags, exclude, profile, max_tools, and versions fields
  • First-wins tool collision logic (tools.py > static MCP > registry MCP)
  • Credential resync detection — mid-session API key changes automatically respawn MCP subprocesses
  • hive mcp install/remove/list/info/config CLI commands (#6787)
  • Structured MCP error codes and failure diagnostics (#6529@vakrahul)

Mattermost Integration (#6747@wakqasahmed)

  • New Mattermost messaging tool for sending messages and channel management

Job Hunter: PDF Resume Support (#6857@Ttian18)

  • Agents can now accept PDF resume input via file path

Open-Meteo Weather Tool (#5892@nikhilvarmakandula)

  • Free real-time weather data tool — no API key required

Local LLM Support (#6028)

  • Ollama added as a first-class LLM provider option in quickstart

Security

  • SSRF protection on web_scrape tool — block requests to internal/private IP ranges (#6879@Hundao)
  • Path traversal fix in session_store — resolve symlinks before prefix check (#6876@Hundao)
  • SQL injection fix in csv_sql tool — use DuckDB parameter binding instead of string interpolation (#1408@Jbheemeswar)
  • Supply chain pin — lock litellm==1.81.7 to block compromised upstream release (#6784)

Bug Fixes

  • Fix queen agent unable to read custom user-installed skills — skill dirs now threaded to queen executor (#6803)
  • Fix lazy import crash in email_tool when resend not installed (#6811@Hundao)
  • Fix Windows date formatting in queen memory (#6822@sundaram2021)
  • Fix deprecated ast.Index visitor in safe_eval.py for Python 3.12+ (#6796@Hundao)
  • Fix missing __init__.py in file_system_toolkits package (#6056@kurtjallo)
  • Fix 404 fallback route for unknown frontend paths (#6373@shiva9198)
  • Fix SAP tool credential store adapter (#6319@KartikPawade)
  • Fix Git Bash shell config handling and add missing Antigravity subscription option
  • Fix Mattermost message formatting
  • Fix Windows quickstart credential handling (#6844@sundaram2021)
  • Fix Windows PowerShell worker model setup (#6772@sundaram2021)

Refactoring

  • Event loop modularization — extract event loop helpers into separate modules, slimming the main event_loop_node.py (#6633@sundaram2021)
  • Remove deprecated storage/backend.py (267 lines) (#6849)

Documentation

  • MCP Integration Guide updated with Unix socket and SSE transport types (#6855@Ttian18)
  • Fix agent.json examples to match current schema (#6878@Hundao)
  • Add Windows quickstart.ps1 to Quick Start section (#6781)
  • Add HoneyComb and harness documentation to README

Known Issues

  • Windows compatibility with POSIX skills — Skills designed for Unix-like systems (e.g., skills with shell scripts in scripts/, or allowed-tools entries referencing Unix-only commands) may not work correctly on Windows. The skill validator's executable-permission check (chmod +x) is a no-op on Windows, so scripts that pass validation on Linux/macOS may fail at runtime on Windows. If you're authoring cross-platform skills, use Python scripts instead of shell scripts and test on both platforms.

Community Contributors

  • Levin (@levxn) — Skills CLI commands, DS-12/DS-13 config overrides and runtime heuristics
  • Fernando (@fermano) — Agent selection, tool resolution & framework integration
  • Antiarin (@Antiarin) — MCP Registry core module
  • Rahul V (@vakrahul) — Structured MCP error codes and diagnostics
  • Tina (@Ttian18) — MCP transport docs, job-hunter PDF resume support
  • Sundaram (@sundaram2021) — Event loop modularization, Windows fixes, date formatting
  • Hundao (@Hundao) — SSRF protection, path traversal fix, lazy import fix, deprecated AST cleanup
  • Wakqas Ahmed (@wakqasahmed) — Mattermost integration
  • Kartik Pawade (@KartikPawade) — SAP tool credential store fix
  • Juttiga Bheemeswar (@Jbheemeswar) — SQL injection fix in csv_sql
  • Nikhil Varma (@nikhilvarmakandula) — Open-Meteo weather tool
  • Kurt (@kurtjallo) — file_system_toolkits package fix
  • Shiva (@shiva9198) — Frontend 404 fallback route

Upgrading

git pull origin main
uv sync

# For the web frontend
cd core/frontend
npm install
npm run build

or, simply run

./quickstart.sh   # (linux/mac)
.\quickstart.ps1  # (windows)

v0.7.6 — Image Capabilities, Antigravity OAuth & Security Hardening

21 Mar 04:27
3e1282b

Choose a tag to compare

Image capabilities end-to-end (upload, screenshot passthrough, vision detection, model fallback), native Google OAuth for Antigravity, structured skill error codes, symlink sandbox security fix, and PDF URL support. 28 files changed in the image capabilities alone; 8 new files across the skills error system.

Features

Image Capabilities (PR #6682)

  • User image upload — attach images directly to chat messages in the UI; forwarded as image content blocks to the LLM (closes #6579)
  • GCU screenshot passthrough — browser tool screenshots are forwarded to vision-capable LLMs as image content alongside the text result, enabling visual reasoning during web automation (closes #6678)
  • Vision capability detection — new capabilities.py module in core/framework/llm/ determines whether a model supports images via layered allow/deny rules per provider and model name; covers ZAI, MiniMax, DeepSeek, Cerebras, Groq, and local runners (Ollama, LM Studio, vLLM, llama.cpp) (closes #6679)
  • Automatic image stripping — image content blocks are stripped before calls to non-vision models, preventing API errors (closes #6679)
  • Vision fallback — when the primary model lacks vision, a fallback vision-capable model describes the images as text and injects that description so the text-only model still receives visual context (closes #6680)
  • Aria ref systemrefs.py annotates aria_snapshot() output with [ref=eN] markers for every interactive element; LLMs can now target browser elements by stable ref ID (selector="e5") instead of fragile CSS selectors (closes #6681)

Antigravity / Subscription

  • Native Google OAuth flow extracted into core/antigravity_auth.py — cleaner separation from runner startup, proper credential caching, and validation of existing credentials before re-auth
  • Fixed Antigravity model tool-call bug: thought_signature field now included in function call parts to satisfy the model's expected format (was causing 400 errors)
  • OAuth client ID and secret now resolved correctly through all fallback paths

Structured Skill Error Codes (closes #6366)

  • New skill_errors.py with typed SkillError hierarchy (load, parse, validation, execution categories)
  • Skill parser, catalog, and manager now surface structured error codes and diagnostic messages instead of bare exceptions
  • Full test coverage in test_skill_errors.py

Bug Fixes

Security: symlink sandbox escape (closes #1167)

  • get_secure_path in the file-system toolkit now resolves symlinks before the path-prefix check, preventing a symlink-based escape from the sandbox boundary

Graph cleanup

  • Removed dead check_constraint placeholder from graph that was never implemented and added noise to execution logs

Tools

  • pdf_read tool now accepts HTTP/HTTPS URLs in addition to local paths — downloads to a temp file, validates Content-Type: application/pdf, and cleans up automatically; completes the web_scrape → pdf_read fallback workflow
  • web_scrape gracefully handles non-HTML content types instead of crashing

Windows quickstart

  • Fixed missing MiniMax option in PowerShell quickstart script

Upgrading

git pull origin main
uv sync

# For the web frontend
cd core/frontend
npm install
npm run build

or, simply run

./quickstart.sh   # (linux/mac)
.\quickstart.ps1  # (windows)

v0.7.5 — Parallel Subagent Display, Session Resume & Stability Fixes

20 Mar 03:23
764012c

Choose a tag to compare

v0.7.5 — Parallel Subagent Display, Session Resume & Stability Fixes

Fixes six interconnected bugs affecting parallel subagent execution, session resume integrity, and GCU browser subagent termination. Introduces a tmux-style parallel agent display. 9 files changed, +636 / −25 lines.

Features

Parallel Subagent Tmux Display (#6652)

  • New ParallelSubagentBubble component renders concurrent subagents as a tmux-style terminal multiplexer instead of individual scattered message bubbles
  • Each subagent gets its own pane with title bar (agent name, message count, context usage bar), scrollable body with markdown rendering, and blinking cursor
  • Pane focus tracking — last-active pane gets a colored border, others dim
  • Click any pane title bar to zoom it full-width; click again to restore
  • Finished subagents show a green checkmark, dimmed pane, and no cursor
  • Header badge shows live count: "3 running" → "2 running" → "5 done"
  • Smart grouping: subagent messages are grouped across interleaved queen/system messages, only breaking on hard boundaries (user messages, run dividers, or next-stage worker messages)

Bug Fixes

Nodes calling tools they don't own (continuous mode) (#6653)

  • In continuous conversation mode, all nodes share one conversation. When the graph cycles, the LLM sees delegate_to_sub_agent calls from a previous node's history and replays them on a node that doesn't have sub_agents
  • Added a guard in the tool dispatch that verifies delegate_to_sub_agent was actually offered to the current node before accepting the call
  • Evidence: event logs showed start node (tools: google_sheets_get_values only) calling delegate_to_sub_agent 3 times

Indistinguishable subagent instances (#6653)

  • When delegating to the same subagent type N times (e.g. 3× browser-researcher), all instances shared the same node_id and stream_id, making them indistinguishable in the event stream
  • Embedded the existing instance counter (previously only used for filesystem paths) into node_id: instance 1 keeps the original format, instance 2+ gets a :N suffix
  • Event logs now show distinct identifiers: batch-orchestrator:subagent:browser-researcher, :2, :3, etc.

Preamble prompt lost on session resume (#6653)

  • EXECUTION_SCOPE_PREAMBLE and GCU_BROWSER_SYSTEM_PROMPT were only injected in the fresh session code path — the resume path called compose_system_prompt() which had no slot for these preambles
  • Added execution_preamble and node_type_preamble parameters to compose_system_prompt()
  • Resume path now applies the same preamble conditions as the fresh path

Blocking memory consolidation (#6653)

  • The CONTEXT_COMPACTED event handler was awaiting consolidate_queen_memory(), blocking the event bus until LLM calls completed
  • Changed to fire-and-forget asyncio.create_task(), matching the teardown path
  • Filtered to queen-only compactions — previously fired on all compactions (17 in one session, including every subagent and worker node)

GCU browser subagent not terminating (#6653)

  • browser_researcher_node had output_keys=[] and vague success_criteria, causing infinite judge RETRY loops until the 10-iteration timeout killed the subagent
  • Added concrete output_keys=["research_results"] so the judge has a completion signal
  • Cleared ambiguous success_criteria that caused subjective LLM evaluation loops
  • Updated system prompt with explicit completion instructions (set_output + report_to_parent fallback)
  • Fixed the judge to not auto-RETRY when all output keys are already satisfied in the same turn as tool calls
  • Fixed goal_context to show report_to_parent(mark_complete=true) as the exit path when output_keys is empty

Google Sheets tool JSON input handling

  • google_sheets_update_values and google_sheets_append_values now accept stringified JSON arrays (LLMs sometimes stringify list arguments)
  • Moved credentials check before input validation so missing-credentials errors aren't masked by parse failures

Cleanup

  • Configure pytest to ignore DeprecationWarning (#1727)

Upgrading

git pull origin main
uv sync

# For the web frontend
cd core/frontend
npm install
npm run build

or, simply run

./quickstart.sh   # (linux/mac)
.\quickstart.ps1  # (windows)