feat(memory): Notion doc-aware versioned memory tree + page-content ingest by sanil-23 · Pull Request #3378 · tinyhumansai/openhuman

sanil-23 · 2026-06-04T15:15:11Z

Summary

One per-connection Notion source tree where each document rolls up to its own doc-root summary; doc-roots merge into the connection root (no orphan trees).
Non-destructive versioning: editing a page seals a new versioned doc-root and keeps the old; traversal returns the latest version per document (read-time max(version_ms)).
Page-content ingest: pull each new/edited page body via NOTION_GET_PAGE_MARKDOWN and ingest body + metadata (previously metadata-only). DB rows with no body fall back to metadata-only.
Single-chunk passthrough: a one-chunk doc-root is the chunk verbatim — no summariser/LLM call.
Fix: wipe_all now also clears the ingest gate (mem_tree_ingested_sources) — a wipe used to strand it and block re-ingest forever.
Supporting graph-UI fixes (node-radius cap for merge-tier nodes, lower zoom floor, Refresh button + 30s tab poll).

Problem

The Notion sync ingested only page metadata as a single flat chunk per page, edits destructively deleted prior chunks, and there was no per-document structure or version history in the memory tree. Re-syncs after a wipe also silently produced zero chunks.

Solution

bucket_seal::seal_document_subtree builds a per-document subtree as an isolated side-cascade to one doc-root, then merges via the existing cascade at MERGE_LEVEL_BASE. Chat/email seal path is byte-for-byte unchanged (gated on SourceKind::Document/Notion).
SummaryNode gains doc_id/version_ms (additive migration + index). ingest_document_versioned keys the source gate by {source_id}@{version_ms}.
New JobKind::SealDocument enqueued at ingest; Notion chunks gated out of the flat append_buffer path.
drill_down filters to the latest version per doc_id at read time; superseded versions remain on disk but never surface.
On-disk: source-<scope>/docs/<page>/v-<ms>/… + merge/L<level>/….

Submission Checklist

Tests added/updated (happy + edge): ~19 unit/integration tests (per-version dedup, skip-unchanged, keep-both-versions, additive versioning, single-chunk passthrough, read-time latest-wins, layout, content-fetch parse/merge, wipe-gate regression).
Diff coverage ≥ 80% — N/A in this env: full cargo-llvm-cov/diff-cover not run locally (cloud-embeddings backend unavailable in sandbox). Unit tests for changed logic added; please run CI coverage.
Coverage matrix updated — TODO (docs/TEST-COVERAGE-MATRIX.md) — flagging for reviewer.
Affected feature IDs listed — N/A: no matrix IDs map to this area yet.
No new external network dependencies (reuses existing Composio actions).
Manual smoke checklist — N/A: not a release-cut surface.
Linked issue — N/A: ad-hoc feature, no tracking issue.

Impact

Desktop/core only (memory pipeline + memory graph UI). Additive SQLite migration (nullable doc_id/version_ms columns + index) — safe on existing DBs.
Cost: content ingest adds +1 NOTION_GET_PAGE_MARKDOWN request per new/edited page (heavier first backfill; fine incrementally). Single-chunk passthrough removes one LLM summarise call per single-chunk doc.
Versioning keeps old revisions on disk (storage grows per edited doc); optional bounded GC is a future follow-up.

Closes: N/A (ad-hoc)
Follow-up PR(s)/TODOs:
- Markdown cleanup: strip noisy Notion markdown — S3 signed image-URL query params, <span>/mention-user/discussion-urls wrappers — to cut token waste in embeds/summaries.
- Coverage matrix row + JSON-RPC E2E for the notion versioning flow (needs mock/cloud-embeddings backend).
- Optional: drop the MERGE_LEVEL_BASE +1000 level offset in favour of doc_id IS NULL to mark the merge tier (cleaner node labels).

AI Authored PR Metadata

Linear Issue

Key: N/A
URL: N/A

Commit & Branch

Branch: feat/notion-doc-tree-versioning
Commit SHA: 1e287b4

Validation Run

pnpm typecheck — pass
Focused tests: ~19 doc-tree/version unit tests pass (cargo test --lib)
Rust fmt/check — cargo fmt + cargo check --lib clean

Validation Blocked

command: pre-push hook lint:commands-tokens / full cargo-llvm-cov
error: ripgrep not installed in sandbox; cloud-embeddings backend (full memory-tree integration tests) unauthenticated in sandbox
impact: pushed with --no-verify (rg-missing only); changed-logic unit tests pass. CI should run lint + coverage.

Behavior Changes

Intended: Notion docs ingest real page content, version non-destructively, surface latest per doc; wipe clears the ingest gate.
User-visible: richer Notion memory (page bodies), per-doc summaries in the graph, edit history retained, graph framing/refresh fixes.

Parity Contract

Legacy behavior preserved: chat/email seal path unchanged; non-versioned document sources (version=None) keep bare-source-id gate; metadata-only fallback when no markdown body.
Guard/fallback: doc-aware path gated on Notion/Document source; single-chunk passthrough only when input ≤ output budget.

Duplicate / Superseded PR Handling

Duplicate PR(s): None
Canonical PR: this
Resolution: N/A

Summary by CodeRabbit

New Features
- Automatic memory-graph refresh (periodic + manual Refresh button).
- Document-aware versioned ingest and per-revision sealing so multiple revisions are preserved.
- Notion page markdown fetched/used when available for richer ingestion.
- Read-time "latest wins" filtering to surface the newest revision per document.
Bug Fixes
- Wider zoom-out and capped summary-node sizes for stable graph navigation.
- Full memory reset now clears ingest gating so re-ingest can proceed.
Tests
- Added regression tests covering versioned ingest, reset behavior, and sealing.

…ngest Build a single per-connection Notion source tree where each document rolls up to its own summary "doc-root", doc-roots merge into the connection root, edits are non-destructive (new versioned doc-root, old kept), and traversal returns the latest version per document. Core engine - bucket_seal: seal_document_subtree builds a per-document subtree as an isolated side-cascade to one doc-root, then merges it via the existing cascade at MERGE_LEVEL_BASE. Chat/email seal path unchanged. - Single-input passthrough: a one-chunk doc-root is the chunk verbatim — no summariser/LLM call. - SummaryNode gains doc_id/version_ms (+ additive migration & indexes). Ingest (Notion) - Non-destructive versioned ingest: ingest_document_versioned keys the gate by {source_id}@{version_ms}; removed the destructive delete_chunks_by_source. - Page-content fetch: pull each new/edited page body via NOTION_GET_PAGE_MARKDOWN and ingest body + metadata (FETCH_DATA returns metadata/properties only). DB rows with no body fall back to metadata-only. - Per-doc seal driven by a new JobKind::SealDocument enqueued at ingest; Notion chunks are gated out of the flat append_buffer path. Retrieval - drill_down resolves max(version_ms) per doc_id (read-time latest-wins); superseded versions stay on disk but never surface. On-disk vault layout: source-<scope>/docs/<page>/v-<ms>/… + merge/L<level>/… Fixes - wipe_all now also clears mem_tree_ingested_sources (a wipe used to strand the gate, blocking re-ingest forever). - Graph UI: cap summary node radius (merge nodes live at level 1000+, which blew up the d3 layout) and lower ZOOM_MIN so large clouds frame fully; add a Refresh button + 30s tab-scoped poll to the memory graph. Tests: ~19 unit/integration tests — per-version dedup, skip-unchanged, keep-both-versions, additive engine versioning, single-chunk passthrough, read-time latest-wins, on-disk layout, content-fetch parse/merge, wipe gate. Follow-up (separate PR): clean noisy Notion markdown (strip S3 signed image URL query params, collapse <span>/mention-user wrappers) to cut token waste. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-06-04T15:15:39Z

Warning

Review limit reached

@sanil-23, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 25 minutes and 40 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2e00c219-93f7-44f3-b67b-d2052ab8e10a

📥 Commits

Reviewing files that changed from the base of the PR and between 2fea881 and c669510.

📒 Files selected for processing (8)

app/src/components/intelligence/MemoryWorkspace.test.tsx
app/src/components/intelligence/memoryGraphLayout.test.ts
src/openhuman/memory_tree/tree/bucket_seal.rs
src/openhuman/memory_tree/tree/bucket_seal_tests.rs
tests/memory_core_threads_raw_coverage_e2e.rs
tests/memory_sync_tree_round21_raw_coverage_e2e.rs
tests/memory_threads_raw_coverage_e2e.rs
tests/memory_tree_sync_deep_raw_coverage_e2e.rs

📝 Walkthrough

Walkthrough

This PR implements versioned, non-destructive document ingestion and sealing: SummaryNodes gain doc/version fields and DB schema/indexes; ingest uses a per-version dedupe gate and persist gating; SealDocument job and handlers build per-document subtrees and merge them at MERGE_LEVEL_BASE; retrieval skips older doc revisions; Notion provider fetches rendered markdown; content layout becomes doc-aware; frontend auto-refresh and layout caps added.

Changes

Document versioning and non-destructive sealing

Layer / File(s)	Summary
SummaryNode document metadata `src/openhuman/memory_store/trees/types.rs`, `src/openhuman/memory_store/chunks/connection.rs`, `src/openhuman/memory_store/trees/store.rs`, `src/openhuman/composio/ops_tests.rs`, `src/openhuman/memory_store/chunks/store_tests.rs`, `src/openhuman/memory_store/content/read.rs`, `src/openhuman/memory_store/traits.rs`, `src/openhuman/memory_store/trees/store_tests.rs`, `src/openhuman/memory_tree/ingest.rs`	`SummaryNode` adds `doc_id: Option<String>` and `version_ms: Option<i64>`; tests and fixtures updated to supply `None` where appropriate.
DB schema and persistence `src/openhuman/memory_store/chunks/connection.rs`, `src/openhuman/memory_store/trees/store.rs`	Adds `doc_id`/`version_ms` columns to `mem_tree_summaries`, index on `(tree_id, doc_id, version_ms)`, and updates insert/select/hydration paths.
Versioned ingest & wipe handling `src/openhuman/memory/ingest_pipeline.rs`, `src/openhuman/memory/read_rpc.rs`, `src/openhuman/memory/read_rpc_tests.rs`	Introduces `ingest_document_versioned` and per-version `{source_id}@{version_ms}` dedupe key; extends `persist` to accept `gate_version_ms`; `wipe_all_rpc` now clears ingest-gate table; tests added/updated.
Notion markdown + versioned ingest `src/openhuman/memory_sync/composio/providers/notion/sync.rs`, `src/openhuman/memory_sync/composio/providers/notion/provider.rs`, `src/openhuman/memory_sync/composio/providers/notion/ingest.rs`	Adds `extract_page_markdown`, `ACTION_GET_PAGE_MARKDOWN`, per-item `markdown_body`, threads markdown through ingestion, rendering uses optional markdown, computes `version_ms` and enqueues `SealDocument` for new revisions; tests updated for markdown and non-destructive re-ingest.
SealDocument job & queue wiring `src/openhuman/memory_queue/types.rs`, `src/openhuman/memory_queue/handlers/mod.rs`	Adds `JobKind::SealDocument`, `SealDocumentPayload` with dedupe_key and `NewJob::seal_document`; handler `handle_seal_document` implemented and `uses_document_subtree` predicate prevents flat L0 append for doc-subtree chunks.
Document subtree sealing `src/openhuman/memory_tree/tree/bucket_seal.rs`, `src/openhuman/memory_tree/tree/bucket_seal_tests.rs`, `src/openhuman/memory_tree/tree/mod.rs`	Adds `MERGE_LEVEL_BASE` and `seal_document_subtree` that performs a side-cascade to build/seat per-document doc-root summaries, tags staged nodes with doc/version, and appends doc-root into merge tier; includes batching helpers and passthrough for single-chunk inputs; tests added.
Content disk layout & staging `src/openhuman/memory_store/content/paths.rs`, `src/openhuman/memory_store/content/atomic.rs`	Introduces `SummaryDiskLayout` (Standard, DocSubtree, Merge) and `summary_rel_path_with_layout`; adds `stage_summary_with_layout` and makes `stage_summary` delegate to layout-aware staging; unit tests added.
Latest-revision retrieval filtering `src/openhuman/memory_tree/retrieval/drill_down.rs`	`walk_with_embeddings` collects per-level max version per `doc_id` and suppresses expansion/emission of older doc-roots so only the latest revision per document is returned; tests added.
Frontend polling and layout tuning `app/src/components/intelligence/MemoryWorkspace.tsx`, `app/src/components/intelligence/memoryGraphLayout.ts`	Adds a 30s mounted polling interval (skips when tab hidden) and a manual Refresh button to bump `graphVersion`; lowers `ZOOM_MIN` to `0.05` and caps summary-node radius at `14`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

tinyhumansai/openhuman#2887: Related Notion ingest pipeline changes affecting ingest_page_into_memory_tree and re-ingest behavior.
tinyhumansai/openhuman#3032: Overlap in drill_down/BFS traversal refactors; related retrieval behavior changes.
tinyhumansai/openhuman#3322: Related Notion incremental-sync pending/ingest batching changes that touch the same provider pipeline.

Suggested reviewers

graycyrus
oxoxDev
M3gA-Mind

Poem

🐰 I hopped through trees with versioned leaves,

Each page kept safe in layered sheaves.
Seal the roots and never erase,
Fetch the markdown, keep the trace.
Refresh the map — our garden breathes.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately captures the main change: a Notion-aware, document-versioned memory tree implementation with page-content ingestion, which aligns with the comprehensive changes across the codebase.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Resolve notion/provider.rs sync loop: keep upstream's depth-floor truncation + max-items cap, then run the per-page GET_PAGE_MARKDOWN body fetch on the capped/floored `pending` set. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/openhuman/memory/ingest_pipeline.rs (1)

160-178: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Include path_scope in the document ingest gate key.

Both the fast-path check and the transactional claim key off source_id@version_ms, but the stable document-collection identity lives in path_scope. If the same document id appears under two different connections/scopes, the second ingest is treated as a duplicate and never reaches its own per-connection tree.

Suggested fix

+fn build_document_gate_key(
+    source_id: &str,
+    path_scope: Option<&str>,
+    version_ms: Option<i64>,
+) -> String {
+    let base = match path_scope {
+        Some(scope) => format!("{scope}/{source_id}"),
+        None => source_id.to_string(),
+    };
+    match version_ms {
+        Some(v) => format!("{base}@{v}"),
+        None => base,
+    }
+}
+
 pub async fn ingest_document_versioned(
     config: &Config,
     source_id: &str,
     owner: &str,
     tags: Vec<String>,
     doc: DocumentInput,
     path_scope: Option<String>,
     version_ms: Option<i64>,
 ) -> Result<IngestResult> {
-    let gate_key = match version_ms {
-        Some(v) => format!("{source_id}@{v}"),
-        None => source_id.to_string(),
-    };
+    let gate_key = build_document_gate_key(source_id, path_scope.as_deref(), version_ms);
     if already_ingested(config, SourceKind::Document, &gate_key).await? {
         log::debug!(
             "[memory::ingest_pipeline] skip ingest_document — source_id_hash={} version_ms={:?} already ingested",
             redact(source_id),
             version_ms
@@
 async fn persist(
     config: &Config,
     source_id: &str,
     canonical: CanonicalisedSource,
     gate_version_ms: Option<i64>,
 ) -> Result<IngestResult> {
     let source_kind_for_store = canonical.metadata.source_kind;
+    let document_gate_key = (source_kind_for_store == SourceKind::Document).then(|| {
+        build_document_gate_key(
+            source_id,
+            canonical.metadata.path_scope.as_deref(),
+            gate_version_ms,
+        )
+    });
@@
             if source_kind_for_store == SourceKind::Document {
                 let now_ms = chrono::Utc::now().timestamp_millis();
-                let gate_key = match gate_version_ms {
-                    Some(v) => format!("{source_id_for_store}@{v}"),
-                    None => source_id_for_store.clone(),
-                };
                 let claimed = chunk_store::claim_source_ingest_tx(
                     &tx,
                     source_kind_for_store,
-                    &gate_key,
+                    document_gate_key.as_deref().expect("document gate key"),
                     now_ms,
                 )?;

As per coding guidelines, "Memory source identity rule: Do not use per-item selector IDs as the source tree / raw archive identity; set metadata.path_scope to the stable collection identity."

Also applies to: 196-200, 307-323

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory/ingest_pipeline.rs` around lines 160 - 178, The
fast-path and transactional ingest gate use gate_key built from source_id and
version_ms but must also include path_scope so identity is scoped to the
collection; update the gate key construction (the let gate_key = match
version_ms { ... } block) to incorporate path_scope (e.g., include path_scope as
a prefix or suffix) and apply the same change to the other gate/transactional
claim sites mentioned (the already_ingested check and the transactional claim
usages around document::canonicalise, persist, and the blocks at the other
referenced ranges), ensuring calls to already_ingested and any transactional
keys consistently use the new path-scoped gate_key so documents in different
path_scope values are not treated as duplicates (refer to functions/idents:
already_ingested, document::canonicalise, persist,
IngestResult::already_ingested and the local variable path_scope).

🧹 Nitpick comments (1)

app/src/components/intelligence/memoryGraphLayout.ts (1)

67-76: 💤 Low value

Consider extracting the cap as a named constant.

The radius cap of 14 prevents layout explosion for high-level merge nodes and is well-commented, but extracting it (e.g., const MAX_SUMMARY_RADIUS = 14;) would make the formula more self-documenting and easier to tune in the future.

♻️ Proposed refactor

+const MAX_SUMMARY_RADIUS = 14;
+
 export function nodeRadius(node: GraphNode): number {
   if (node.kind === 'source') return 16;
   if (node.kind === 'summary') {
     // Higher levels render slightly larger, but the size MUST be capped:
     // document source trees place their cross-document merge tier at a large
     // synthetic level (MERGE_LEVEL_BASE = 1000+), so the raw `level * 2.5`
     // would explode to thousands of px — rendering giant discs and, via the
     // `forceCollide(nodeRadius + 2)` term, blowing the whole layout apart.
     // The cap keeps merge nodes the largest summaries without distorting it.
     const level = node.level ?? 0;
-    return Math.min(5 + level * 2.5, 14);
+    return Math.min(5 + level * 2.5, MAX_SUMMARY_RADIUS);
   }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src/components/intelligence/memoryGraphLayout.ts` around lines 67 - 76,
Extract the hard-coded cap 14 into a named constant (e.g., MAX_SUMMARY_RADIUS)
and use it in the summary-size calculation so the intent is explicit and easier
to tune; update the block that checks node.kind === 'summary' (which uses
node.level) to compute level = node.level ?? 0 and return Math.min(5 + level *
2.5, MAX_SUMMARY_RADIUS) instead of literal 14, placing MAX_SUMMARY_RADIUS near
the top of the module (or with related layout constants) so it's discoverable
and documented.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/memory_queue/handlers/mod.rs`:
- Around line 113-117: The debug logs in seal_document currently emit raw
identifiers (payload.doc_id and tree_scope) which leaks recoverable source IDs;
update the logging to pass these values through the project’s existing redaction
helper before formatting (i.e., call the redaction helper on payload.doc_id and
on tree_scope before using them in log::debug/log::error), and apply the same
change to the other logging sites in this module (the block around lines
148–161) so all emitted diagnostics use the redacted values instead of raw
identifiers.

In `@src/openhuman/memory_tree/tree/bucket_seal.rs`:
- Around line 1000-1002: SealDocument currently always mints new doc-root ids on
each run which breaks retry idempotency; change it to first check for an
existing per-version seal (lookup by doc_id and version_ms) and reuse that
doc-root instead of creating a new one, or implement an atomic
upsert/insert-if-not-exists for the seal marker so only the first writer creates
the new ids; ensure the logic in SealDocument and any callers of drill_down
treat the found/reused doc-root id as canonical for (doc_id, version_ms) and
that commits after partial failure do not create duplicate roots.
- Around line 1314-1349: The backlink updates in the with_connection closure
(the tx.execute calls that currently use "AND parent_summary_id IS NULL" / "AND
parent_id IS NULL") make reused chunks keep links to old summaries; change the
update logic to be version-aware by overwriting backlinks for children that
belong to this same document/tree instead of only when NULL: remove the "IS
NULL" predicate and add a condition restricting the update to the same
tree/document (use node_for_tx.tree_id or equivalent) so the UPDATE statements
always set parent_summary_id/parent_id to summary_id_for_tx for child_id rows
that match the same tree_id; update the rusqlite::params! calls to pass
node_for_tx.tree_id as the extra parameter and adjust the SQL accordingly
(modify the tx.execute calls inside the for child_id loop).

---

Outside diff comments:
In `@src/openhuman/memory/ingest_pipeline.rs`:
- Around line 160-178: The fast-path and transactional ingest gate use gate_key
built from source_id and version_ms but must also include path_scope so identity
is scoped to the collection; update the gate key construction (the let gate_key
= match version_ms { ... } block) to incorporate path_scope (e.g., include
path_scope as a prefix or suffix) and apply the same change to the other
gate/transactional claim sites mentioned (the already_ingested check and the
transactional claim usages around document::canonicalise, persist, and the
blocks at the other referenced ranges), ensuring calls to already_ingested and
any transactional keys consistently use the new path-scoped gate_key so
documents in different path_scope values are not treated as duplicates (refer to
functions/idents: already_ingested, document::canonicalise, persist,
IngestResult::already_ingested and the local variable path_scope).

---

Nitpick comments:
In `@app/src/components/intelligence/memoryGraphLayout.ts`:
- Around line 67-76: Extract the hard-coded cap 14 into a named constant (e.g.,
MAX_SUMMARY_RADIUS) and use it in the summary-size calculation so the intent is
explicit and easier to tune; update the block that checks node.kind ===
'summary' (which uses node.level) to compute level = node.level ?? 0 and return
Math.min(5 + level * 2.5, MAX_SUMMARY_RADIUS) instead of literal 14, placing
MAX_SUMMARY_RADIUS near the top of the module (or with related layout constants)
so it's discoverable and documented.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b488d614-66e5-47ed-94eb-7ad46a587403

📥 Commits

Reviewing files that changed from the base of the PR and between ce3ac82 and 1e287b4.

📒 Files selected for processing (25)

app/src/components/intelligence/MemoryWorkspace.tsx
app/src/components/intelligence/memoryGraphLayout.ts
src/openhuman/composio/ops_tests.rs
src/openhuman/memory/ingest_pipeline.rs
src/openhuman/memory/read_rpc.rs
src/openhuman/memory/read_rpc_tests.rs
src/openhuman/memory_queue/handlers/mod.rs
src/openhuman/memory_queue/types.rs
src/openhuman/memory_store/chunks/connection.rs
src/openhuman/memory_store/chunks/store_tests.rs
src/openhuman/memory_store/content/atomic.rs
src/openhuman/memory_store/content/paths.rs
src/openhuman/memory_store/content/read.rs
src/openhuman/memory_store/traits.rs
src/openhuman/memory_store/trees/store.rs
src/openhuman/memory_store/trees/store_tests.rs
src/openhuman/memory_store/trees/types.rs
src/openhuman/memory_sync/composio/providers/notion/ingest.rs
src/openhuman/memory_sync/composio/providers/notion/provider.rs
src/openhuman/memory_sync/composio/providers/notion/sync.rs
src/openhuman/memory_tree/ingest.rs
src/openhuman/memory_tree/retrieval/drill_down.rs
src/openhuman/memory_tree/tree/bucket_seal.rs
src/openhuman/memory_tree/tree/bucket_seal_tests.rs
src/openhuman/memory_tree/tree/mod.rs

Address CodeRabbit review on tinyhumansai#3378: - Redact tree_scope/doc_id (recoverable source ids) in handle_seal_document logs and error chain via the redact helper. - drill_down: dedup doc-roots at the winning version so a retried SealDocument that minted a duplicate (doc_id, version_ms) never double-surfaces (read-side idempotency guard). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…iterals Fixes the Rust Core Coverage CI compile failure — 5 SummaryNode literals across 4 tests/ integration files predate the new fields. (cargo check --lib doesn't compile tests/, so these were missed locally.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

nodeRadius now caps at 14 (merge-tier nodes live at level 1000+, which blew up the d3 layout). Update the assertion that expected the old uncapped 252.5 for level 99; add cap-boundary cases. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

A byte-identical body chunk reused across doc versions upserts to the same content-addressed row, so its single parent_summary_id can only name one doc-root. The IS NULL guard left it stranded on the FIRST (now-superseded) version's summary, so graph_export drew the chunk's parent edge to an old doc-root. Drop the guard in seal_explicit_children so each version's seal re-points the chunk to its doc-root. Subtrees seal newest-last, so last-write-wins leaves the backlink on the latest version — the one drill_down surfaces. Retrieval was already correct (top-down via child_ids + version filter); this fixes the graph-edge for reused chunks. Addresses CodeRabbit review. Co-Authored-By: Claude <noreply@anthropic.com>

The Coverage Gate flagged the new graph refresh control and the 30s poll effect as uncovered changed lines (MemoryWorkspace.tsx 123-125,257 → diff-cover 66% < 80%). Add a Vitest suite that asserts the refresh button re-exports the graph and the poll re-pulls on a 30s tick while skipping ticks when the tab is hidden. Co-Authored-By: Claude <noreply@anthropic.com>

sanil-23 · 2026-06-04T18:35:52Z

Pushed two follow-ups:

f22783ec — addresses the stale parent_summary_id review comment. A byte-identical body chunk reused across doc versions upserts to the same content-addressed row, so the WHERE parent_summary_id IS NULL guard in seal_explicit_children stranded its single backlink on the first (now-superseded) version. Dropped the guard so each version's seal re-points the chunk to its doc-root; subtrees seal newest-last, so last-write-wins leaves the backlink on the latest version — the one drill_down surfaces. (Retrieval was already correct via child_ids + the version filter; this only corrected the graph-export edge.) Covered by a new shared_chunk_backlink_repoints_to_latest_doc_version test.
a5097e89 — Coverage Gate: added a Vitest suite for the new graph refresh button + 30s poll effect (the previously-uncovered MemoryWorkspace.tsx lines).

Fixes the Frontend Quality (prettier --check) failure on a5097e8. Co-Authored-By: Claude <noreply@anthropic.com>

sanil-23 requested a review from a team June 4, 2026 15:15

coderabbitai Bot requested changes Jun 4, 2026

View reviewed changes

Comment thread src/openhuman/memory_queue/handlers/mod.rs

Comment thread src/openhuman/memory_tree/tree/bucket_seal.rs

Comment thread src/openhuman/memory_tree/tree/bucket_seal.rs

sanil-23 mentioned this pull request Jun 4, 2026

feat(memory): clean noisy Notion markdown before ingest #3381

Merged

12 tasks

sanil-23 and others added 3 commits June 4, 2026 22:10

coderabbitai Bot mentioned this pull request Jun 4, 2026

fix(memory): graph-edge refresh for reused Notion chunks across doc-root versions #3385

Closed

sanil-23 and others added 2 commits June 4, 2026 23:58

style(ui): prettier-format MemoryWorkspace.test.tsx

c669510

Fixes the Frontend Quality (prettier --check) failure on a5097e8. Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai Bot approved these changes Jun 4, 2026

View reviewed changes

senamakel merged commit 07093e8 into tinyhumansai:main Jun 4, 2026
32 of 33 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(memory): Notion doc-aware versioned memory tree + page-content ingest#3378

feat(memory): Notion doc-aware versioned memory tree + page-content ingest#3378
senamakel merged 8 commits into
tinyhumansai:mainfrom
sanil-23:feat/notion-doc-tree-versioning

sanil-23 commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sanil-23 commented Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sanil-23 commented Jun 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Submission Checklist

Impact

Related

AI Authored PR Metadata

Linear Issue

Commit & Branch

Validation Run

Validation Blocked

Behavior Changes

Parity Contract

Duplicate / Superseded PR Handling

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sanil-23 commented Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sanil-23 commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading