This document explains what kars is made of and why each part exists. For diagrams, see Architecture diagrams. For a faster on-ramp, see Getting started.
- The agent must not see Azure credentials in production. In AKS mode the agent runs under a different UID than the router; only the router holds the Foundry credential. Compromise of any agent must not grant access to the Azure subscription. (
kars devco-locates agent + router in one container for inner-loop work — see Two modes for the dev-vs-prod boundary.) - Every external call must pass a policy decision point. No invisible side effects, no silent network egress.
- Inter-agent communication must be confidential, authenticated, and forward-secret. No plaintext fallback.
- The same data-path code runs in dev and in prod. A small number of code paths branch on
KARS_DEV_MODEto swap auth source (static key vs. Workload Identity) and spawn driver (Docker vs. Kubernetes). The router's policy decision points, content-safety parsing, audit format, and governance hooks do not change between modes — local mode is easier, not different. - Operability over cleverness. Standard Kubernetes primitives (CRDs, NetworkPolicies, RBAC, Helm) so platform teams can operate kars the way they operate the rest of the cluster.
Everything below follows from those five.
kars has four code components, two languages, and one rule that ties them together.
| Component | Language | Crate / package | Responsibility |
|---|---|---|---|
| Controller | Rust (kube-rs) | kars-controller |
Watches KarsSandbox and its eight peer workload CRDs, plus the infrastructure CRDs KarsAuthConfig and (controller-internal) KarsPairing; reconciles them into namespaces, pods, services, NetworkPolicies, ConfigMaps, federated identities. |
| Inference router | Rust (axum) | kars-inference-router |
Sits in the data path of every external call — identity, content safety, governance, audit, A2A. Mesh: WebSocket-bridges opaque Signal-Protocol ciphertext for the agent (the Signal session is plugin-owned; see The mesh). |
| A2A gateway | Rust (axum) | kars-a2a-gateway + kars-a2a-core |
Public-ingress entry point for A2A 1.0.0 peer traffic. Verifies signed AgentCards, routes to the correct sandbox, emits audit. |
| kars OpenClaw plugin | TypeScript | runtimes/openclaw/ (id kars) |
The agent-side surface for OpenClaw-runtime sandboxes: registers 24 governance-aware tools with OpenClaw (kars_spawn, kars_mesh_send, kars_mesh_transfer_file, foundry_web_search, foundry_code_execute, foundry_image_generation, …) and owns the Signal Protocol session via @microsoft/agent-governance-sdk. Catalogued in OpenClaw plugin. |
| kars Hermes plugin | Python | runtimes/hermes/ (id kars) |
The agent-side surface for Hermes-runtime sandboxes: registers 9 governance-aware tools, the pre_tool_call AGT governance hook, and the post_tool_call telemetry hook with the Hermes Agent gateway. Owns the Signal Protocol session via the Python AGT MeshClient at runtimes/agt-mesh-python/ (kars-agt-mesh), at byte-for-byte wire-format parity with the TS SDK. Catalogued in Hermes plugin. |
| CLI | TypeScript | @kars/cli |
Lifecycle of clusters, sandboxes, policies. 30+ commands. The CLI is convenience; everything it does is achievable with az + helm + kubectl. |
The rule that ties them together: the agent has no network of its own. The router is the only process in the sandbox pod that can talk to the outside. Every other property of kars is a downstream consequence of holding that line.
kars dev and kars up produce sandboxes that are observably the same to the agent code, but architecturally very different to a security reviewer. There is also a local-Kubernetes middle ground (kars dev --target local-k8s) for when you want production-shaped infrastructure on a laptop. All three are first-class.
- Pod shape: one Docker container.
- Inside: the agent runtime and the inference router are co-located in the same image. They communicate over
127.0.0.1. - Isolation: Docker network. Mounted secret file holds the resource-level Azure OpenAI key. No iptables, no NetworkPolicy, no UID separation.
- Identity: key-based (the key you provided on first run).
- What it is for: plugin authoring, policy iteration, smoke tests, demos. Inner loop only.
The router still runs the same data-path code it does in prod — the request handlers, policy decision points, content-safety parsing, audit format, and governance hooks are all real. What is different is the auth source (a static API key or PAT instead of an exchanged Workload-Identity token) and the spawn driver (Docker instead of Kubernetes). A small number of code paths branch on KARS_DEV_MODE to enable those substitutions. What is not real is the network and identity isolation — there is no separate router process to break out to. Treat dev mode as a development surface, not a security surface.
Between Docker-only dev and full AKS there is a third option: a real Kubernetes cluster running locally via kind.
- Pod shape: multi-container Kubernetes pod — the same shape as prod. Agent (UID 1000), router (UID 1001), egress-guard init container.
- Inside: the controller, router, and AgentMesh relay + registry run inside the kind cluster. The same Helm chart that installs to AKS is installed locally with dev-friendly defaults.
- Isolation: real Kubernetes
NetworkPolicy, real UID separation, real iptables egress-guard. The only differences from AKS are the auth source (a static provider credential instead of Workload Identity) and the infrastructure (kind nodes instead of cloud node pools). - Identity: key-based (same as Docker dev mode).
- What it is for: testing CRD changes, validating
NetworkPolicygeneration, running the chaos test suite locally, validating Helm changes, exercising the operator TUI against a real cluster — anything where Docker-only dev is too simple but AKS is too heavy.
kars dev --target local-k8sThe kind cluster is created if it does not already exist, the locally-built images are loaded into kind, and the Helm chart is installed. See Blueprint 02 — Local Kubernetes dev loop for the full walkthrough (including the Headlamp dashboard).
This mode runs the same controller reconciliation loop and the same router data path as prod. What is different is the auth source and the infrastructure. What is not different is the pod shape, the NetworkPolicy, the UID separation, the egress-guard, the CRD schemas, or the audit chain.
- Pod shape: multi-container Kubernetes pod.
init: egress-guard— installs iptables rules so the agent UID can only reach the router on localhost. Safety net, not the policy point — the router is.agent— the runtime, UID 1000, no direct egress.inference-router— the Rust router, UID 1001. The HTTP listener binds0.0.0.0:8443and the forward proxy binds0.0.0.0:8444; the egress-guard iptables rules + the NetworkPolicy together pin the only reachable peer for UID 1000 to those two ports on loopback. This is where egress, governance, content-safety, token-budget, and audit are actually enforced.
- Isolation: The router enforces the egress allowlist on every outbound request. A Kubernetes NetworkPolicy on the namespace is generated as a safety net that limits blast radius if the router process is bypassed or compromised; it pins egress to DNS, Foundry, the AgentMesh relay, and the A2A gateway. Optionally Kata + AMD SEV-SNP for hardware-enforced isolation.
- Identity: Two modes, gated by
kars up --mesh-trust=anonymous|entra(defaultanonymous).- Anonymous mode — the cluster's federated AKS Workload Identity is the calling principal for every sandbox's Foundry traffic. The router exchanges the projected service-account token for an AAD token; no keys on disk. AGT mesh registration is anonymous tier (score 0). Suitable for single-tenant clusters and demos.
- Entra mode (
--mesh-trust=entra) — eachKarsSandboxgets a typed Microsoft Entra Agent ID (amicrosoft.graph.agentIdentitySP derived from a tenant-wide blueprint), with Foundry RBAC scoped to that SP, a federated credential trusting the sandbox's K8s service account, and per-sandbox Entra-signed JWTs for AGT mesh verified-tier registration. One identity across the data plane (Foundry) and the mesh plane (AGT relay/registry). Seedocs/architecture/entra-agent-id/.
- What it is for: real workloads, multi-tenant fleets, anything that touches customer data.
Whichever mode you run in, the CRDs are the same, the audit chain is the same, and the policy profiles are the same. The dev → prod jump is one CLI command, not a re-architecture.
Walk through what happens when an agent in prod mode says "call the model":
sequenceDiagram
autonumber
participant Agent as Agent (UID 1000)
participant Router as Router (UID 1001)
participant Gov as Governance<br/>(InferencePolicy + AGT)
participant Budget as Token budget
participant WI as Identity layer<br/>(Workload Identity or<br/>Entra Agent ID sidecar)
participant Model as Inference backend<br/>(Copilot · Foundry · GitHub Models · ...)
Agent->>Router: POST /v1/chat (prompt)
Router->>Gov: allow? (model + sandbox + action)
alt deny
Gov-->>Router: deny + reason
Router-->>Agent: 403 policy_violation
else allow
Gov-->>Router: allow
Router->>Budget: check tenant budget
alt over budget
Budget-->>Router: exceeded
Router-->>Agent: 429 budget_exceeded
else within budget
alt provider = GitHub Copilot
Router->>Router: refresh exchanged Copilot JWT<br/>(cached ≈ 25 min, proactive)
alt model family = Claude (Anthropic shape)
Router->>Model: POST /v1/messages + Bearer JWT<br/>+ Editor-Version, Copilot-Integration-Id
Model-->>Router: native Anthropic response (no shape rewrite)
else
Router->>Model: POST /chat/completions + Bearer JWT<br/>+ Editor-Version, Copilot-Integration-Id
Model-->>Router: completion
end
else provider = Foundry / Azure OpenAI
Router->>WI: exchange SA token → AAD bearer
WI-->>Router: bearer (cached, ~5 min)
Router->>Model: POST /openai/... + bearer
Model-->>Router: completion + prompt_filter_results
Router->>Router: parse content flags<br/>report to AGT BehaviorMonitor
else provider = GitHub Models (dev)
Router->>Model: POST /chat/completions + Bearer PAT
Model-->>Router: completion (no inline filter results)
end
Router->>Router: append audit record (hash-chained)
Router-->>Agent: completion
end
end
In prose:
- The agent SDK is configured (by the runtime adapter) to point at
http://127.0.0.1:8443. The router is the policy point — it decides whether an outbound call is allowed;egress-guardiptables and the K8s NetworkPolicy are safety nets that drop the packet only if the router is bypassed. - The router receives the request. It asks the governance layer (
InferencePolicy+ AGTPolicyDecisionProvider) whether this call is allowed. Deny → 403. - It checks the token budget for the tenant. Over → 429.
- It branches by provider (read from
~/.kars/config.json→provider):- GitHub Copilot (default for
kars dev, full Copilot model catalogue: Claude Opus / Sonnet, GPT-5 / 4.1, Gemini, o-series): the router exchanges the GitHub OAuth token for a short-lived Copilot JWT athttps://api.github.com/copilot_internal/v2/tokenand proactively refreshes it. Outbound requests carry the static headers Copilot's ingress requires (Editor-Version,Editor-Plugin-Version,Copilot-Integration-Id). For Claude models the router routes natively to/v1/messages(Anthropic shape passthrough — no lossy OpenAI-to-Anthropic rewrite, full tool-calling fidelity). For all other models the standard/chat/completionspath is used. Copilot doesn't returnprompt_filter_resultseither, so inline Content Safety isn't enforced — same caveat as GitHub Models. - Foundry / Azure OpenAI (full feature set, default for AKS): mints an AAD token — either the cluster's Workload Identity (anonymous mode) or a per-sandbox Entra Agent ID token via the shared
entra-auth-sidecar(Entra mode,kars up --mesh-trust=entra). Same fail-closed token-acquisition contract either way: whenAUTH_SIDECAR_URLis set the router refuses to fall back to WI/IMDS/API-key. Content Safety is enforced by Foundry's DefaultV2 guardrails inline — the router does not make a separate Content Safety call. The Foundry response carriesprompt_filter_resultsannotations; the router parses them and reports flags to AGT'sBehaviorMonitor. - GitHub Models (dev mode, free tier): forwards
Authorization: Bearer <PAT>directly tohttps://models.github.ai/inference. GitHub Models doesn't returnprompt_filter_results, so inline Content Safety isn't enforced — see security.md → What we do not defend against. Foundry-only routes (Memory Store, agents, evaluations, indexes) return clean 501.
- GitHub Copilot (default for
- The router appends an audit record — prompt-fingerprint, model, tokens-in / tokens-out, decision, latency — hash-chained to the previous record so tampering is detectable.
- The router returns to the agent.
Sub-agent inheritance. When a parent spawns a sub-agent (
/sandbox/spawn), the router propagatesOPENCLAW_MODEL,KARS_PROVIDER, the upstream endpoint, and the auth credential (Copilot OAuth token or PAT) into the new container's environment. The child uses the same provider + model + credentials as its parent without per-spawn wiring.
More providers later. Copilot, Foundry, and GitHub Models are the three backends wired in today. Adding more (direct Anthropic, Bedrock, AWS Q, third-party OpenAI-compatible gateways) is mostly a matter of an endpoint+auth recipe in
inference-router/src/proxy.rs::build_upstream_urlplus a CLI prompt branch. We're tracking provider-expansion through GitHub issues — please open a feature request describing the provider, auth model, and which Foundry-only features (if any) you'd want preserved.
Every other external call (web fetch, MCP tool, sub-agent spawn, A2A peer message) goes through the same shape with a different policy module. The handler is chat_completions_handler in inference-router/src/routes/chat_completions.rs.
Inter-agent communication is end-to-end encrypted. Two agents that want to talk:
- Each agent registers identity (Ed25519 sign, X25519 KEX) and uploads signed prekeys to the AgentMesh registry.
- The initiator looks up the peer in the registry, fetches its prekeys, runs X3DH to derive a shared secret.
- The initiator sends a KNOCK to the peer. The peer's agent (via
@microsoft/agent-governance-sdkrunning plugin-side, not the router) evaluates trust score againstAGT_TRUST_THRESHOLD. If the peer is below threshold, the KNOCK is denied (audited). - On accept, both sides advance the Double Ratchet. Every subsequent message is encrypted with a fresh key (forward secrecy) and authenticated.
- The relay sees only opaque ciphertext blobs and addressing metadata. It cannot read messages and cannot impersonate either party.
The relay and registry are operated by kars (agentmesh namespace, two small services). They are not trusted with content. The cryptographic primitives are libsodium / Signal Protocol. The Signal session is owned by the agent process — each runtime ships its own mesh client (OpenClaw + the six SDK adapters use @microsoft/agent-governance-sdk from npm; Hermes uses runtimes/agt-mesh-python/, the Python AGT MeshClient at byte-for-byte wire-format parity with the TS SDK) and runs X3DH / Double Ratchet / KNOCK inside the sandbox container (UID 1000). The inference router links the agentmesh crate from crates.io only for shared governance primitives (AuditLogger, PolicyEngine, TrustManager, MCP rate-limit / redactor) — it holds no Signal session keys, performs no encryption or decryption, and merely WebSocket-bridges opaque ciphertext between the agent and the relay. There is no in-tree fork of either SDK. The cross-runtime symmetry (Hermes ↔ OpenClaw) is proven end-to-end by tests/e2e/interop/hermes_openclaw_bidi.sh (local kind) and tests/e2e/interop/aks_full_suite.sh (AKS production sandboxes).
Multi-agent peer roster. When an agent spawns more than one sub-agent (each with a
role— e.g.data analyst,visualization engineer,technical writer), the OpenClaw runtime maintains a peer roster of canonical names + roles and automatically prepends aPeer roster:block to every outboundmesh_send/mesh_transfer_fileonce two or more siblings exist. Sub-agents that need to hand work to each other ("send the chart to viz", "deliver the brief to the writer") resolve role references against this roster instead of guessing names — eliminating misroute bugs in pipelines likeanalyst → viz → writer. The roster is built deterministically from spawn metadata;kars_spawnrejects sub-agents without aroleparameter when more than one sibling will exist. Implementation:runtimes/openclaw/src/core/agt-tools/agt.ts(roster maintenance + auto-prepend),runtimes/openclaw/skills/kars-spawn/SKILL.md(agent-facing contract).
See docs/architecture/agt-boundary.md for what AGT enforces vs what kars enforces.
A2A (Agent-to-Agent, 1.0.0) is the public-ingress sibling of the mesh. Where the mesh handles intra-fleet traffic with strong cryptographic guarantees, the A2A gateway handles cross-organisation traffic where the peer is not part of your AgentMesh.
- Public ingress (Azure-managed Kubernetes ingress / Application Gateway).
- Every inbound request carries a signed
AgentCardand is rejected if the caller's identity is not trusted. Today identity is established via theX-A2A-Agent-Subjectheader set by the upstream mTLS layer; the in-axumAgentCardsignature verifier ships as a library (kars_a2a_core::verify_inbound_card, unit-tested) and is tracked in the roadmap to be wired into the gateway request path. - The gateway routes to the right
A2AAgentCRD (which binds the public name to a sandbox + policy). - Audit, rate limiting, content safety run on the gateway.
Two separate channels for two separate trust models. See docs/architecture/a2a-gateway.md.
You operate kars by writing CRDs, not by clicking through a dashboard or
editing a private config store. The full schema lives in
docs/api/crd-reference.md. This section answers
the question we get every time: why nine CRDs and not one?
A CRD makes sense when it represents a thing that
- has its own lifecycle (created / updated / deleted at different times than the agent),
- is referenced by name from somewhere else (so giving it an identity is cheaper than copy-pasting its body), or
- is owned by a different team than the agent author (platform team, security team, ops team).
Anything that fails all three tests is a KarsSandbox field, not its own CRD.
| CRD | Owner / changes-with | What you'd give up if it were a KarsSandbox field |
|---|---|---|
KarsSandbox |
Agent author. Changes per agent. | (this is the agent — every other CRD is a reference into it) |
InferencePolicy |
Platform / FinOps. Changes per quarter. | Every team would copy model / region / budget into every agent; rotating an OpenAI deployment or shifting traffic to a cheaper model would mean editing every KarsSandbox. |
ToolPolicy |
Security team. Changes per audit. | A central "tool X is approval-only" rule could not exist. Each agent would carry its own copy and security would have no enforcement seam. |
McpServer |
Plugin owner. One per backend. | An MCP server URL + auth would be inlined into every agent that uses it. Rotating the URL means editing N sandboxes and racing the rollouts. |
A2AAgent |
The agent itself, but the public-ingress identity has its own lifecycle (TLS cert, public route, rate-limit, agent-card). | The Kubernetes Service / cert / ingress wiring would be tangled into the agent CRD. With A2AAgent separate, the agent can come and go while the public endpoint and its trust anchors stay stable. |
KarsMemory |
Whoever owns the memory store. Outlives the agent. | Memory bindings would die with each agent restart; cross-agent memory sharing would require duplicating store IDs in every sandbox. |
KarsEval |
Eval / QA. Run on demand. | Eval runs would be pods or jobs without a reproducible record; you'd lose the status history that lets kars eval show "this prompt regressed at commit X". |
TrustGraph |
Cluster admin. Cross-namespace, cross-cluster. | Sibling-trust at scale collapses: every sandbox would need a list of every peer's AMID. TrustGraph is the only cluster-scoped CRD precisely because trust topology is a cluster concern. Status — reconciler-only. The graph is projected to /etc/kars/trustgraph/graph.json in each sandbox; the router does not yet consume it for mesh-admission gating. KNOCK accept/deny is — and stays — agent-side (the router cannot decrypt the Signal session). The router-side post-decision trust-score map exists for audit/governance only. Tracked in the roadmap: router-side mesh-admission gating against the projected graph (pre-handshake, refuse to bridge a WS for an edge not in the graph) — a separate, coarser layer that complements agent-side KNOCK rather than replacing it. See api/crd-reference.md §TrustGraph. |
EgressApproval |
On-call / SRE. Ephemeral, TTL-bounded. | A single inline overlay would mix permanent allowlist drift with short-lived break-glass grants. As a separate CRD, the grant carries its own audit record, TTL, and revocation path. |
Two further resources are infrastructure CRDs you don't author per
agent. KarsAuthConfig is a cluster-scoped singleton, written by
kars mesh setup-trust, that anchors tenant-wide Entra Agent ID trust
(see Agent identity); when it's absent, sandboxes run
in the AGT anonymous tier. KarsPairing is controller-internal — it
records the binding between a KarsSandbox and its AgentMesh registry
identity. Both are exposed as CRDs so the controller can use the same
reconciliation machinery as everything else, but you never write either by
hand. That brings the registered total to eleven CRDs.
Three properties fall out of treating these as separate CRDs:
- One source of truth. Every component — controller, router, gateway,
CLI, operator TUI — reads the same etcd objects. There is no shadow config
store, no in-memory state, and no "which file did the operator edit on
which node?" When the router sees a request, it resolves
InferencePolicyfrom the cluster and trusts that to be the policy in force. The same is true for tool denials, MCP backends, and trust edges. - RBAC scopes follow concerns. A platform team can grant
inferencepolicies.kars.azure.com/editto FinOps without giving them any control over agents. Security can owntoolpolicieswithout being able to stop a workload. These splits are not theoretical — they are the reason the CRDs exist. - GitOps works without translation. Each CRD is independently committed to a repo and reconciled by Argo / Flux. There is no "render these nine things from one mega-template" step, because the nine things are already the API.
If kars only ever ran one agent per cluster, with one model, no policy,
no peers, no memory, and no eval — nine CRDs would be cargo-culting. We
believe a single CRD with nine optional sub-objects would be worse for
real-world deployments because you'd lose the rate-of-change separation
above. But this is a judgement call, and we re-evaluate it every minor
release. If your deployment never references a CRD other than KarsSandbox,
the others cost you nothing — they only exist as registered types, not
running workloads.
The CRDs are served at kars.azure.com/v1alpha1. The project is at v0.1.0; see CHANGELOG.md for what's shipped and docs/roadmap.md for what's next.
- Image tags are always
:latestin source. Pinning happens at install time via Helm values or env vars (e.g.MAF_RUNTIME_IMAGE). Earlier versions of the project drifted across version tagsv11–v25; we chose convention over per-tag pins to make the tag mismatch class of bug impossible. - The controller default-image lookup is centralised in
controller/src/reconciler/runtime.rs. Adding a new runtime is one match arm and one default-image function. The CRD enum is the source of truth for which runtimes exist. - The router does not depend on the Azure SDK. All Azure calls (IMDS, Workload Identity exchange, Foundry, Content Safety) are direct REST. This keeps the router's dependency surface small and its build deterministic.
- No OpenClaw source is patched. The OpenClaw integration uses the public plugin API and
tools.denyconfig only. Any upstream OpenClaw release is drop-in compatible. See Upstream alignment.
- Architecture diagrams — the same content, rendered.
- Security model — what each layer enforces.
- Blueprints — six reference deployment shapes.
- CRD reference — the operator's API.