-
Notifications
You must be signed in to change notification settings - Fork 2
feat(schema): knowledge decay + permanence tier (KNOWLEDGE-DECAY.md) #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
8539180
693c301
59f6c50
5774fa0
94761f4
a03dee4
eadc259
3dffd86
bf7517c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,207 @@ | ||
| # MeMex Knowledge Decay & Permanence Tier — Schema Design Draft | ||
| *Draft by Molty | 2026-04-23 | Status: DRAFT — incorporating daemon-bot feedback, PR ready* | ||
|
|
||
| ## Context | ||
|
|
||
| Triggered by PRs #3 and #4 merging into `JPeetz/MeMex-Zero-RAG` main. Goal: formalize knowledge decay, permanence tiers, and the revalidation pipeline before Hermes Studio integration begins. | ||
|
|
||
| --- | ||
|
|
||
| ## Schema Additions (Node-Level Fields) | ||
|
|
||
| ### Core temporal fields | ||
|
|
||
| ```yaml | ||
| last_verified_at: <ISO-8601 timestamp> # last time an agent confirmed this is still true | ||
| confidence: 0.85 # float [0.0–1.0]; decays over time | ||
| confidence_floor: 0.20 # decay floor; node never drops below this | ||
| source_reliability_index: 0.9 # decay multiplier; 1.0 = slowest decay | ||
| is_immutable: false # true = Hard Persistence Tier (see below) | ||
| revalidation_status: current # enum: current | flagged | revalidating | contested | retired | ||
| conflict_detected: false # true = new high-confidence node contradicts this immutable node | ||
| conflict_trigger_id: null # ID of the observation/node that triggered conflict_detected | ||
| ``` | ||
|
|
||
| ### Nested `temporal_context` block (replaces flat fields) | ||
|
|
||
| ```yaml | ||
| temporal_context: | ||
| created_at: <ISO-8601> | ||
| last_updated_at: <ISO-8601> | ||
| last_verified_at: <ISO-8601> | ||
| decay_rate_coefficient: 0.9 # derived from source_reliability_index | ||
| decay_interval_hours: 24 # global default; per-node override available | ||
| ``` | ||
|
|
||
| Nested structure preferred over flat fields — supports schema evolution without migration (new sub-keys don't break parsers that skip unknowns). | ||
|
|
||
| ### Existing `privacy_protocol` block | ||
|
|
||
| No changes. Masking remains at the write-stream layer; Hermes Studio only ever receives already-masked data. | ||
|
|
||
| --- | ||
|
|
||
| ## Confidence Decay Policy | ||
|
|
||
| **Formula (applied on periodic tick):** | ||
|
|
||
| ``` | ||
| confidence = max(confidence_floor, confidence × (1 - base_decay_rate / source_reliability_index)) | ||
| ``` | ||
|
|
||
| **Tick interval:** 24h global default. Per-node override via `temporal_context.decay_interval_hours`. | ||
|
|
||
| **Source reliability index — manual seed at node creation, tiered by source type:** | ||
|
|
||
| | Source type | `source_reliability_index` | Decay character | | ||
| |---|---|---| | ||
| | Architectural constraints, RULES.md | 0.95–1.0 | Very slow — months to meaningful decay | | ||
| | Design decisions, meeting outcomes | 0.7–0.85 | Moderate — weeks | | ||
| | Chat observations, inferred state | 0.3–0.5 | Fast — days | | ||
| | Transient/ephemeral notes | 0.1–0.2 | Very fast — hours | | ||
|
|
||
| **Execution gate:** When `confidence` drops below `confidence_floor + 0.10` (configurable threshold), node transitions to `revalidation_status: flagged`. This replaces silent pruning — the node remains readable but is tagged as pending revalidation. | ||
|
|
||
| --- | ||
|
|
||
| ## Hard Persistence Tier | ||
|
|
||
| Nodes with `is_immutable: true` form the **Hard Persistence Tier**: | ||
| - Skip all confidence decay calculations | ||
| - Cannot auto-transition to `flagged` (can still be manually updated) | ||
| - Intended for: RULES.md entries, fundamental system constraints, established cross-bot agreements, architectural invariants | ||
| - Contradiction handling: see **Conflict Detection** below | ||
|
|
||
| --- | ||
|
|
||
| ## Conflict Detection on Immutable Nodes | ||
|
|
||
| When a new node with `confidence ≥ 0.8` is written and directly contradicts a `is_immutable: true` node: | ||
|
|
||
| 1. Set `conflict_detected: true` on the immutable node | ||
| 2. Set `conflict_trigger_id` to the ID of the incoming observation | ||
| 3. Transition the immutable node to `revalidation_status: contested` | ||
| 4. Enqueue as highest-priority revalidation event (above heat-map tier) | ||
|
|
||
| **Contested retrieval behavior:** The retrieval engine returns contested nodes with `status: contested` metadata and clamps `effective_confidence` to `confidence_floor` (maximum skepticism within the allowed range). The WARNING tag pattern: | ||
|
|
||
| ``` | ||
| metadata: { status: "contested", conflict_trigger_id: "<id>", effective_confidence: <floor> } | ||
| ``` | ||
|
|
||
| **Expected agent behavior:** When a retrieved node carries `status: contested`, agents must: | ||
| - Present the information with explicit uncertainty framing | ||
| - Cite `conflict_trigger_id` when referencing the node | ||
| - Not assert the node's content as established fact | ||
| - **Not use the node as the sole basis for any automated execution path** — contested information cannot trigger irreversible actions until `revalidation_status` returns to `current` | ||
|
|
||
| Implementation is agent-specific (how conflict context is injected into prompt/context is up to each agent); this documents the required behavior contract, not the chain-of-thought mechanics. Recommended pattern: surface `conflict_trigger_id` and an uncertainty marker in retrieved context so the agent has the signal without requiring schema-mandated prompt syntax. | ||
|
|
||
| **The immutable node is NOT overridden or deleted** until a human or agent explicitly resolves the conflict. This prevents silent data loss while surfacing the contradiction. | ||
|
|
||
| **Edge case — contested node at confidence floor:** A node where `effective_confidence == confidence_floor` and `node_status == contested` is not distinguishable from a stable floor-clamped node by confidence value alone. The decay worker and any retrieval classifier must check **both** `dependency_taint` and `revalidation_status` to correctly classify it. A contested floor node will have `revalidation_status: pending` or `revalidation_status: contested` — a stable floor node will not. Implementations that check confidence alone will misclassify contested floor nodes as settled. | ||
|
|
||
| **Resolution TTL:** If a `contested` node is not resolved within a configurable window (default: 48h), emit an `unresolved_conflict` system event. Routing and alert targets are deployment config — not hardcoded in the schema. | ||
|
|
||
| **Dependency tainting:** When a node transitions to `contested`, its direct dependents (via `links_to` traversal, depth=1 only) are flagged: | ||
|
|
||
| ```yaml | ||
| dependency_taint: true | ||
| taint_origin_id: <id_of_contested_parent_node> | ||
| ``` | ||
|
|
||
| Tainted nodes: | ||
| - Remain usable and do not inherit the no-solo-execution block | ||
| - Are enqueued for secondary review | ||
| - Surface `dependency_taint: true` and `taint_origin_id` as metadata warnings at retrieval | ||
|
|
||
| Auto-clear: when the parent node's `revalidation_status` returns to `current`, the revalidation worker clears `dependency_taint` and `taint_origin_id` on all nodes where `taint_origin_id` matches the resolved parent. | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cascade-clear worker logic — implementation detail, not schema. How the revalidation worker executes this cascade clear is left to the implementation PR: lookup strategy (batch index scan vs per-node linked-list traversal), atomicity guarantees (is the clear-sweep transactional across all children, or eventually consistent?), and rate-limiting on nodes with large taint-child fanout. Schema only specifies the post-condition ( |
||
|
|
||
| Full cascading taint (depth > 1) is deferred to v2 — blast radius is uncontrollable without heat-map telemetry data to bound the scope. | ||
|
|
||
| --- | ||
|
|
||
| ## Revalidation Queue | ||
|
|
||
| ### Entry | ||
| - Any node transitioning to `revalidation_status: flagged` or `contested` is enqueued | ||
| - Nodes can also be manually flagged by an agent (`wiki_write` with `revalidation_status: flagged`) | ||
|
|
||
| ### Priority ordering — Heat-Map Heuristic (highest to lowest) | ||
|
|
||
| 1. **Contested nodes** — `conflict_detected: true`; highest priority, time-bounded by Resolution TTL | ||
| 2. **Heat-map tier** — node retrieval frequency by Hermes Studio (high retrieval = high priority); primary signal | ||
| 3. **High connectivity** — node has ≥3 `links_to` entries (high blast radius if stale) | ||
| 4. **Low confidence floor** — node's `confidence_floor` < 0.25 (more fragile) | ||
| 5. **FIFO** — otherwise, oldest-flagged first | ||
|
|
||
| Retrieval frequency is tracked by the Hermes context injection layer and surfaced as a node metadata property updated on each read. | ||
|
|
||
| ### Processing | ||
|
|
||
| Any agent with wiki access picks up a `flagged` or `revalidating` node, sets `revalidation_status: revalidating`, re-reads/verifies the underlying claim, then either: | ||
| - **Confirms**: updates `last_verified_at`, restores `confidence` to initial value, resets to `current` | ||
| - **Updates**: rewrites node content with corrected info, same field resets | ||
| - **Retires**: marks node with `revalidation_status: retired` (not deleted; retained indefinitely per zero-rag philosophy) | ||
| - **Resolves conflict**: if `conflict_detected: true`, sets `conflict_detected: false`, clears `conflict_trigger_id`, resolves based on evidence | ||
|
|
||
| First-come-first-served on the flagged queue. No designated revalidation agent required. | ||
|
|
||
| ### Telemetry hook | ||
|
|
||
| Expose `/revalidation/queue/depth` on the SSE server: | ||
|
|
||
| ```json | ||
| { | ||
| "queue_depth": 4, | ||
| "by_status": { "flagged": 2, "revalidating": 1, "contested": 1 }, | ||
| "contested_count": 1, | ||
| "oldest_flagged_at": "2026-04-22T14:00:00Z" | ||
| } | ||
| ``` | ||
|
|
||
| Alert threshold: queue depth > 10 for > 1h = backlog risk. Contested nodes always surface in telemetry regardless of queue depth. Log to stdout; integrate into any existing healthcheck. | ||
|
|
||
| --- | ||
|
|
||
| ## Retrieval Priority (separate from decay) | ||
|
|
||
| `source_reliability_index` has two roles that must not be conflated: | ||
|
|
||
| 1. **Decay coefficient** — controls how fast confidence decays (covered above) | ||
| 2. **Retrieval ranking weight** — ground-truth sources outrank transient observations at query time | ||
|
|
||
| Retrieval ranking (highest to lowest authority): | ||
| 1. Repository / codebase (live truth) | ||
| 2. Design docs, formal meeting outcomes | ||
| 3. Bot-authored analysis with explicit sourcing | ||
| 4. Chat observations, inferred state | ||
|
|
||
| A chat observation with a recent `last_verified_at` does NOT outrank an architectural constraint just because it's fresher. Retrieval must weight source authority, not only recency. | ||
|
|
||
| --- | ||
|
|
||
| ## Hermes Studio Boundary | ||
|
|
||
| Masking happens before any Hermes context injection: | ||
|
|
||
| ``` | ||
| wiki_read result → privacy_protocol masking layer → Hermes Studio context window | ||
| ``` | ||
|
|
||
| Hermes never has visibility into raw PII or private-tier nodes. This is enforced at the write-stream on the MCP server, not at the application layer. `contested` node metadata passes through to Hermes — agents in that context are expected to handle uncertainty. | ||
|
|
||
| --- | ||
|
|
||
| ## Resolved Design Decisions | ||
|
|
||
| | Question | Decision | | ||
| |---|---| | ||
| | Decay tick interval | 24h global default; per-node override via `decay_interval_hours` | | ||
| | Revalidation executor | Any agent with wiki access; first-come-first-served | | ||
| | `source_reliability_index` assignment | Manual seed at creation, tiered by source type | | ||
| | Retired node retention | Keep forever; zero-rag philosophy | | ||
|
|
||
| --- | ||
|
|
||
| *Draft ready for PR. Post comments in daemon-bot or open against `JPeetz/MeMex-Zero-RAG`.* | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,48 @@ | ||
| --- | ||
| title: Molty Project Snapshot — 2026-04-23 15:10 EDT | ||
| type: synthesis | ||
| created: 2026-04-23 | ||
| author: molty | ||
| tags: | ||
| - agent:molty | ||
| - type:snapshot | ||
| - project:memex-zero-rag | ||
| links_to: | ||
| - KNOWLEDGE-DECAY.md | ||
| - wiki/synthesis/snapshot-molty-2026-04-23.md | ||
| --- | ||
|
|
||
| # Molty Project Snapshot — 2026-04-23 15:10 EDT | ||
|
|
||
| *Hourly big-review snapshot. Captures PR#6 edge-case resolution.* | ||
|
|
||
| --- | ||
|
|
||
| ## MeMex-Zero-RAG | ||
|
|
||
| **Status:** Active. PR#6 open, updated. | ||
|
|
||
| **Action taken (15:10 review):** | ||
| PR#6 review comment from titaniumshovel (Chris) at 16:20Z identified a schema edge case unaddressed in prior reviews. Actioned now: | ||
|
|
||
| - **Edge case:** A contested node at `confidence_floor` is not distinguishable from a stable floor-clamped node by confidence value alone. Documented in `KNOWLEDGE-DECAY.md` that decay workers must check `revalidation_status` in tandem. | ||
| - **Commit:** a03dee4 — `docs(schema): add contested-floor edge-case note` | ||
| - **PR comment:** https://github.com/JPeetz/MeMex-Zero-RAG/pull/6#issuecomment-4307136111 | ||
|
|
||
| **Issue #7** (cascade-clear worker / orphaned taint sweep) — opened by Chris, tracked as post-merge impl item. No schema changes needed for PR#6. | ||
|
|
||
| **PR#6 state:** OPEN, ready for Joerg review/merge. 207 additions total. | ||
|
|
||
| --- | ||
|
|
||
| ## Open blockers (unchanged from 13:10 snapshot) | ||
|
|
||
| | Item | Owner | Status | | ||
| |---|---|---| | ||
| | PR#6 merge | Joerg | Waiting on review | | ||
| | KNOWLEDGE-DECAY implementation | Any agent + Joerg | Blocked on merge | | ||
| | cross-worktree wiki_search test | Molty | Unblocked, unrun | | ||
| | context-before-claim → RULES.md | Squad | Marvin suggestion, pending | | ||
| | queue_reply.py | Coconut | Pending | | ||
| | PII gate | Chris | Awaiting green-light | | ||
| | Auto-Confluence-doc-updater share | Edward | Untracked | |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,53 @@ | ||
| --- | ||
| title: Molty Project Snapshot — 2026-04-23 16:10 EDT | ||
| type: synthesis | ||
| created: 2026-04-23 | ||
| author: molty | ||
| tags: | ||
| - agent:molty | ||
| - type:snapshot | ||
| - project:memex-zero-rag | ||
| links_to: | ||
| - KNOWLEDGE-DECAY.md | ||
| - wiki/synthesis/snapshot-molty-2026-04-23-1510.md | ||
| - github.com/JPeetz/MeMex-Zero-RAG/pull/6 | ||
| --- | ||
|
|
||
| # Molty Project Snapshot — 2026-04-23 16:10 EDT | ||
|
|
||
| *Hourly big-review snapshot. Minimal delta from 15:10.* | ||
|
|
||
| --- | ||
|
|
||
| ## MeMex-Zero-RAG | ||
|
|
||
| **Status:** Stable / waiting. No changes since 15:10 snapshot. | ||
|
|
||
| **PR#6:** OPEN — `feat(schema): knowledge decay + permanence tier`. 323 additions. No reviews received. Waiting on Joerg. | ||
|
|
||
| **Branch:** `molty-knowledge-decay-schema` — 7 commits ahead of upstream/main. No new commits this hour. | ||
|
|
||
| **Squad activity (8h window):** No Coconut or Marvin commits to upstream/main or their forks since PRs #3–5 merged early this morning (~02–07Z). Repo is quiet. | ||
|
|
||
| --- | ||
|
|
||
| ## Active open items (unchanged) | ||
|
|
||
| | Item | Owner | Status | | ||
| |---|---|---| | ||
| | PR#6 merge | Joerg | Waiting on review | | ||
| | KNOWLEDGE-DECAY implementation | Any agent + Joerg | Blocked on merge | | ||
| | cross-worktree wiki_search test | Molty | Unblocked (PR#4 merged), unrun | | ||
| | context-before-claim → RULES.md | Squad | Marvin suggestion, pending | | ||
| | queue_reply.py | Coconut | Pending | | ||
| | PII gate | Chris | Awaiting green-light | | ||
| | Auto-Confluence-doc-updater share | Edward | Untracked | | ||
|
|
||
| --- | ||
|
|
||
| ## Check notes (big-review) | ||
|
|
||
| - **Project health:** All blockers external to Molty. Nothing stalling on Molty's end. | ||
| - **Zoom out:** Cross-worktree `wiki_search` test is the highest-leverage Molty-owned unfinished item. | ||
| - **Real-world test gap:** `wiki_search` cross-worktree verification has not been run since PR#4 merged. | ||
| - **Squad sync:** No new signals from Coconut/Marvin. Nothing to post to daemon-bot. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,57 @@ | ||
| --- | ||
| title: Molty Project Snapshot — 2026-04-23 17:10 EDT | ||
| type: synthesis | ||
| created: 2026-04-23 | ||
| author: molty | ||
| tags: | ||
| - agent:molty | ||
| - type:snapshot | ||
| - project:memex-zero-rag | ||
| - infra:webhook-down | ||
| links_to: | ||
| - KNOWLEDGE-DECAY.md | ||
| - wiki/synthesis/snapshot-molty-2026-04-23-1610.md | ||
| - github.com/JPeetz/MeMex-Zero-RAG/pull/6 | ||
| --- | ||
|
|
||
| # Molty Project Snapshot — 2026-04-23 17:10 EDT | ||
|
|
||
| *Hourly big-review snapshot. Notable delta: webhook subscription infrastructure down.* | ||
|
|
||
| --- | ||
|
|
||
| ## MeMex-Zero-RAG | ||
|
|
||
| **Status:** Stable / waiting. PR#6 open, no new reviews. | ||
|
|
||
| **PR#6:** OPEN — 376 additions (increased from 323; snapshot commits added to branch). 1 review (titaniumshovel/COMMENTED — self, contested-floor edge case addressed in a03dee4). No Joerg/Coconut/Marvin reviews. Still waiting. | ||
|
|
||
| **Squad activity:** No Coconut/Marvin git commits in last 8h. Repo quiet. | ||
|
|
||
| --- | ||
|
|
||
| ## Infrastructure alert: Webhook subscriptions DOWN | ||
|
|
||
| All 7 Teams webhook subscription creates failed this review cycle. | ||
|
|
||
| **Error:** `nabu-pn7g55fc.tailbf57c9.ts.net` — DNS unresolvable. 0 active subscriptions. | ||
|
|
||
| **Impact:** Molty is currently deaf to Teams pings. Any mentions or messages in daemon-bot, coco, bot-talk since subscriptions lapsed are unread. | ||
|
|
||
| **Not actionable by Molty:** Tailscale node recovery requires Chris to diagnose. No Graph token locally available to poll directly. | ||
|
|
||
| **Workaround:** None autonomous. Chris should check Tailscale dashboard and restart notification receiver if node is offline. | ||
|
|
||
| --- | ||
|
|
||
| ## Open items (unchanged from 16:10) | ||
|
|
||
| | Item | Owner | Status | | ||
| |---|---|---| | ||
| | PR#6 merge | Joerg | Waiting on review | | ||
| | Webhook receiver (nabu-pn7g55fc) | Chris | Node unreachable — needs investigation | | ||
| | cross-worktree wiki_search test | Molty | Unblocked, unrun | | ||
| | context-before-claim → RULES.md | Squad | Pending | | ||
| | queue_reply.py | Coconut | Pending | | ||
| | PII gate | Chris | Awaiting green-light | | ||
| | Auto-Confluence-doc-updater share | Edward | Untracked | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Orphaned taint sweep — implementation detail, not schema.
Edge case: if a
contestedparent is retired (never resolved) or its node ID changes post-merge/refactor,taint_origin_idon children becomes a dangling reference and the auto-clear path never fires. Implementation PR should specify a periodic orphan sweep: for each node withdependency_taint: true, verifytaint_origin_idresolves to a live node whoserevalidation_status != current; if not, clear the taint fields. Schema stays clean — this is worker hygiene.