Skip to content

Orchestrators rename + dashboard, KG recall stabilization, and plan resume#385

Merged
Weegy merged 5 commits into
mainfrom
worktree-rename-agents-orchestrators
Jun 29, 2026
Merged

Orchestrators rename + dashboard, KG recall stabilization, and plan resume#385
Weegy merged 5 commits into
mainfrom
worktree-rename-agents-orchestrators

Conversation

@Weegy

@Weegy Weegy commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Summary

Bundles the operator-UI rename/dashboard work with a round of knowledge-graph recall stabilization and a plan-resume feature. All four commits are deployed and verified on the local docker test stack, and each was reviewed with an independent (GPT-5.4) pass.

What's in here

Knowledge-graph recall stabilization (fix(kg))

  • Self-heal a dual-active KG state (both inmemory and neon installed) by removing the non-selected sibling, keeping the backend that actually holds a DSN (vault or env). Previously both stayed active and reads/writes routed non-deterministically between them — the "KG doesn't work between sessions" symptom.
  • Reconcile the embeddings plugin's ollama_base_url from env on every boot, not just first install. Enabling the Ollama overlay on an existing deployment used to be a silent no-op that left semantic recall, the durable tier and process-reuse inactive (FTS-only).
  • Make the recall relevance judge fail-deterministic with an internal verdict cache, so an identical query replays the same verdict instead of oscillating between filtered and unfiltered.
  • Route the durable ("always-surface") tier through the relevance judge by default, so off-topic curated facts stop surfacing for unrelated queries; falls back to always-surface when no judge is wired.
  • Add a KG capability snapshot to /health (backend, durable, embeddings, semantic recall, durable tier, process-reuse, warnings) so silent degradation is observable instead of requiring boot-log archaeology.

Plan resume (feat(kg))

  • Interrupted multi-step plans resume instead of restarting. Realized in the context-assembler's plan-recall (the only channel that reaches the prompt — turn hooks are observer-only): an interrupted plan renders completed steps as "do not redo" plus an explicit resume point, with a side-effect caveat when the resume-from step was mid-execution.

Operator UI: Orchestrators rename + dashboard (feat(web-ui))

  • Rename "Agents" to "Orchestrators" across the operator UI (en + de).
  • Dashboard landing page (system-health, quick-access, role onboarding); chat moves to /chat.
  • Business-case-first onboarding replacing the technical bootstrap-profile flow.
  • Admin providers / subscription-CLI pages refactored into reusable panels with inline LLM key entry and in-page tab switching.
  • Standard ("fallback") orchestrator protection: the operator agents API refuses to disable or delete it (409 fallback_protected); seeded default name becomes "Standard Orchestrator". Backed by regression tests (test(operator)).
  • Store/nav cleanup; chat turn-usage telemetry badge removed.

Testing

  • Full middleware gate green (typecheck + lint + ~3.6k tests).
  • web-ui next build clean.
  • Deployed and verified on the local docker stack: KG recall pipeline fully active (/health reports backend neon, embeddings on, all capabilities true), dual-active self-heal fired on a real pre-existing conflict, web-ui serving.

Reviewer note — action required before publishing the public image

This branch sets INSTALL_SUBSCRIPTION_CLIS=true for the public ghcr.io/byte5ai/omadia-middleware image in publish-images.yml, which bundles the proprietary Claude CLI. The Dockerfile defaults this OFF for public images pending legal review. It is branch-only (no publish is triggered by this PR), but redistribution of the vendor CLI in the public image needs legal sign-off before the next release.

Follow-ups (not in this PR)

  • Enforce the fallback-orchestrator invariant at the config-apply/store layer too, not only the REST route.
  • Optionally expose buildResumePlan as a plan-runner service the context-assembler consumes, instead of the current parallel realization.

Weegy added 5 commits June 28, 2026 17:08
… predictable

Addresses three reported failures: the KG "not working between sessions",
recalling knowledge unpredictably, and not using plans.

- bootstrap: self-heal a dual-active knowledge-graph state (both inmemory and
  neon installed) by removing the non-selected sibling, keeping the backend
  that actually holds a DSN (vault or env) rather than choosing by env alone.
  Previously left both providers active so reads/writes routed
  non-deterministically between backends.
- bootstrap: reconcile the embeddings plugin's ollama_base_url/ollama_model/
  max_concurrent from env on every boot, not just first install. Enabling the
  Ollama overlay on an existing deployment was a silent no-op, leaving semantic
  recall, the durable tier and process-reuse inactive (FTS-only).
- recall judge: make the relevance judge fail-deterministic with an internal
  verdict cache, so an identical query replays the same verdict instead of
  oscillating between filtered and unfiltered. Errors/abstains keep-all and are
  not cached.
- durable tier: route durable ("always-surface") hits through the relevance
  judge by default (kg_durable_relevance_judge_enabled) so off-topic curated
  facts stop surfacing for unrelated queries; runs even on a term-less turn.
  Falls back to always-surface when no judge is wired.
- health: expose a KG capability snapshot on /health (backend, durable,
  embeddings, semantic recall, durable tier, process-reuse, warnings) so silent
  degradation is observable instead of requiring boot-log archaeology.

Tests cover dual-active self-heal (incl. vault-DSN guard), embeddings reconcile,
verdict caching, durable opt-in/out and term-less judging, and the health
snapshot. Full middleware gate green.
…hannel

Wires plan-resume into the only path that reaches the model prompt. Turn hooks
are observer-only (they cannot inject system context), so resume is realized in
the cross-session plan-recall the context-assembler already injects.

- RecalledPlan gains completedStepGoals + resumeFromInProgress/Sideeffecting.
- loadPlanHits collects the done-step goals and the resume-from step's status.
- An interrupted plan (some steps done AND some open) renders as an explicit
  resume hint: completed steps marked "do not redo", an explicit resume point,
  and the remaining steps — so a follow-up continues the plan instead of
  restarting it.
- Side-effect safety: when the resume-from step was in_progress (possibly
  mid-execution) it carries a caveat to verify before re-running, stronger when
  the step is side-effecting, mirroring buildResumePlan's ambiguous-side-effect
  guard that the recall channel would otherwise drop.

Tests assert the framing reaches the assembled text, the side-effect caveat, and
that fully-done / fully-pending plans do NOT render resume framing.
…access consolidation

Operator-UI overhaul developed across the rename effort:

- Rename "Agents" -> "Orchestrators" throughout the operator UI (en + de),
  including nav, agent picker, and the operator agents page.
- Dashboard as the landing page: the root route now renders a dashboard
  (system-health strip, quick-access, role onboarding); chat moves to /chat.
- Business-case-first onboarding (businessCases.ts + dashboard component)
  replacing the technical bootstrap-profile flow.
- Admin consolidation: providers and subscription-CLI pages refactored into
  reusable panels (ProvidersPanel, SubscriptionClisPanel, LlmAccessTabs) with
  inline LLM key entry and in-page tab switching.
- Standard ("fallback") orchestrator protection: the operator agents API
  refuses to disable or delete it (409 fallback_protected); its seeded default
  name becomes "Standard Orchestrator".
- Store page copy + nav cleanup; chat turn-usage telemetry badge removed;
  header subtitle removed.
- CI: bundle the official Claude CLI into the published middleware image
  (INSTALL_SUBSCRIPTION_CLIS=true) so subscription-CLI agents work on self-host.

web-ui `next build` passes; middleware typecheck/lint/tests green.
…delete)

The fallback orchestrator is now protected from disable/delete in the operator
agents API, but that load-bearing invariant had no CI coverage. Add regression
tests: PATCH {status:'disabled'} and DELETE on the fallback return 409
fallback_protected; enabling/renaming it still works; and the protection also
covers whatever the platform fallbackAgentId points at (id-based, non-"fallback"
slug).
@Weegy Weegy enabled auto-merge (squash) June 29, 2026 07:40
@Weegy Weegy merged commit 1550268 into main Jun 29, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant