Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
135 changes: 135 additions & 0 deletions recipes/import-verification/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# Import Verification

> Read-only checks for imported Open Brain thoughts.

## What It Does

This recipe verifies that imports landed in `public.thoughts` with enough metadata to audit, filter, and troubleshoot them later. It checks source coverage, metadata completeness, missing embeddings, duplicate fingerprints, sample rows, and optional text probes. It does not write to your database.

Use it after running an import recipe such as ChatGPT Conversation Import, Obsidian Vault Import, Gmail import, Google Activity Import, Readwise Import, or a custom importer.

## Prerequisites

- Working Open Brain setup ([guide](../../docs/01-getting-started.md))
- Node.js 18+
- Supabase project URL and service-role key for live database checks

## Credential Tracker

```text
IMPORT VERIFICATION -- CREDENTIAL TRACKER
--------------------------------------

FROM YOUR OPEN BRAIN SETUP
Supabase Project URL: ____________
Supabase Service Role Key: ____________

OPTIONAL
Source to verify: ____________ (example: chatgpt, obsidian, gmail)
Probe text: ____________ (a phrase you expect to find)

--------------------------------------
```

## Steps

1. Open this recipe folder:

```bash
cd recipes/import-verification
```

2. Run the fixture check first. This proves the script works without using credentials:

```bash
node verify-imports.mjs --fixture fixtures/sample-thoughts.json
```

3. Export your Supabase credentials:

```bash
export SUPABASE_URL="https://YOUR_PROJECT_REF.supabase.co"
export SUPABASE_SERVICE_ROLE_KEY="your-service-role-key"
```

You can also put these values in `.env.local` in this recipe folder. Do not commit `.env.local`.

4. Verify recent imports:

```bash
node verify-imports.mjs --limit 1000
```

5. Verify one source:

```bash
node verify-imports.mjs --source chatgpt --limit 1000 --sample 5
```

6. Add a text probe when you know a phrase should exist:

```bash
node verify-imports.mjs --source obsidian --probe "home maintenance" --limit 1000
```

7. Use strict mode for CI-style checks:

```bash
node verify-imports.mjs --source readwise --strict
```

## Options

| Flag | Default | Description |
| ---- | ------- | ----------- |
| `--source SOURCE` | all sources | Filter scanned rows to a source slug such as `chatgpt`, `obsidian`, or `gmail`. |
| `--limit N` | `1000` | Maximum recent rows to scan. |
| `--sample N` | `5` | Number of sample thoughts to print. |
| `--probe TEXT` | none | Check whether scanned thoughts contain a phrase. This is a text probe, not semantic MCP search. |
| `--json` | off | Print machine-readable JSON instead of human-readable output. |
| `--strict` | off | Exit with code `1` when missing metadata, missing embeddings, duplicate fingerprints, or failed probes are found. |
| `--fixture FILE` | none | Analyze a local JSON fixture instead of connecting to Supabase. |
| `--help` | off | Show usage. |

## What Gets Checked

- **Rows by source**: counts scanned rows by `source_type`, `metadata.source_type`, or `metadata.source`.
- **Metadata completeness**: checks for `source`, `source_type`, `source_label`, `imported_at`, `importer_name`, `importer_version`, `input_hash`, `content_fingerprint`, `sensitivity_tier`, and `provenance`.
- **Embeddings**: flags rows with missing or empty embeddings.
- **Duplicate fingerprints**: finds repeated `content_fingerprint` values in the scanned rows.
- **Samples**: prints representative imported rows with ID, source, created date, and content preview.
- **Probe text**: checks whether any scanned content contains a phrase you expect to find.

Older importers may not have every metadata field yet. By default the script reports those gaps without failing. Use `--strict` when you are validating a new importer that should meet the current contract.

## Expected Outcome

For a healthy import, you should see:

- The expected source has non-zero rows.
- Recent imported rows have source metadata.
- Embeddings are present unless the importer intentionally skipped embeddings.
- Duplicate fingerprints are zero or explainable.
- A probe phrase finds at least one matching row when provided.

## Exit Codes

| Code | Meaning |
| ---- | ------- |
| `0` | Checks ran. In non-strict mode this can include warnings. |
| `1` | Strict mode found verification failures. |
| `2` | Missing configuration, unreadable fixture, JSON parse error, or Supabase query failure. |

## Troubleshooting

**Issue: `SUPABASE_URL and SUPABASE_SERVICE_ROLE_KEY are required`**
Solution: Export both variables or create a local `.env.local` file in this recipe folder. Use the service-role key, not the anon key.

**Issue: Source count is zero**
Solution: Check the source slug used by the importer. Some older recipes store only `metadata.source`, while newer schemas may also have a top-level `source_type` column.

**Issue: Missing metadata warnings**
Solution: Older imports may predate the current metadata contract. The warnings are useful for cleanup, but they are not necessarily a failed import unless you use `--strict`.

**Issue: Probe text fails**
Solution: Increase `--limit`, use a simpler phrase, or verify with semantic MCP search. The probe is a local text check over scanned rows, not vector search.
73 changes: 73 additions & 0 deletions recipes/import-verification/fixtures/sample-thoughts.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
[
{
"id": "00000000-0000-4000-8000-000000000001",
"content": "[ChatGPT: Database Migration Strategy | 2025-09-15] Chose PostgreSQL for the reporting service because the team needed relational joins, transactional consistency, and familiar operational tooling.",
"created_at": "2026-06-06T00:00:00Z",
"embedding": [0.01, 0.02, 0.03],
"source_type": "chatgpt",
"content_fingerprint": "sha256:9ec0acdf6f8c1f82d681d0e3b8ec8a7f1ea61d5b4179a6149ddccde2b1a1e33a",
"metadata": {
"source": "chatgpt",
"source_type": "chatgpt",
"source_label": "ChatGPT export - conversations.json",
"source_id": "conversation_abc123:thought_1",
"source_path": "chatgpt-export/conversations.json",
"source_locator": "messages 12-18",
"original_created_at": "2025-09-15T14:22:00Z",
"imported_at": "2026-06-06T00:00:00Z",
"importer_name": "chatgpt-conversation-import",
"importer_version": "1.0.0",
"input_hash": "sha256:2b45a3b4a377d0f5f27b0f8ad77e8f4fc4c8795a739e4fbd9610b7f742b2c4f1",
"content_fingerprint": "sha256:9ec0acdf6f8c1f82d681d0e3b8ec8a7f1ea61d5b4179a6149ddccde2b1a1e33a",
"sensitivity_tier": "standard",
"provenance": {
"method": "llm_extraction",
"source_record": "conversation abc123",
"source_locator": "messages 12-18",
"artifact": "chatgpt-export/conversations.json",
"extractor_model": "openai/gpt-4o-mini",
"review_status": "unreviewed"
},
"type": "decision",
"topics": ["database", "architecture", "reporting"],
"people": [],
"action_items": [],
"confidence": "firm"
}
},
{
"id": "00000000-0000-4000-8000-000000000002",
"content": "[Obsidian: Home Maintenance | Projects] Replace HVAC filters quarterly and keep receipts in the home maintenance folder.",
"created_at": "2026-06-06T00:05:00Z",
"embedding": [0.04, 0.05, 0.06],
"source_type": "obsidian",
"content_fingerprint": "sha256:5f6c9dd716d345f5356e833052e7f2ecfb6a17674f44b61d8ed847a6fc777100",
"metadata": {
"source": "obsidian",
"source_type": "obsidian",
"source_label": "House vault",
"source_id": "Projects/Home Maintenance.md",
"source_path": "Projects/Home Maintenance.md",
"source_locator": "heading: Seasonal maintenance",
"original_created_at": "2025-12-01T09:00:00Z",
"imported_at": "2026-06-06T00:05:00Z",
"importer_name": "obsidian-vault-import",
"importer_version": "1.0.0",
"input_hash": "sha256:083da49e5e445bd40357f6f591af393b806d1f6a2b1d8d1f6a5c9bf8a8c76f22",
"content_fingerprint": "sha256:5f6c9dd716d345f5356e833052e7f2ecfb6a17674f44b61d8ed847a6fc777100",
"sensitivity_tier": "personal",
"provenance": {
"method": "chunked_record",
"source_record": "Projects/Home Maintenance.md",
"source_locator": "heading: Seasonal maintenance",
"artifact": "Projects/Home Maintenance.md",
"review_status": "unreviewed"
},
"type": "task",
"topics": ["home", "maintenance"],
"people": [],
"action_items": ["Replace HVAC filters quarterly"],
"confidence": "firm"
}
}
]
21 changes: 21 additions & 0 deletions recipes/import-verification/metadata.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
"name": "Import Verification",
"description": "Read-only verification script for checking Open Brain import coverage, metadata completeness, embeddings, duplicate fingerprints, and sample rows.",
"category": "recipes",
"author": {
"name": "Alan Shurafa",
"github": "alanshurafa"
},
"version": "1.0.0",
"requires": {
"open_brain": true,
"services": ["Supabase"],
"tools": ["Node.js 18+"]
},
"requires_skills": [],
"tags": ["import", "verification", "audit", "metadata", "read-only"],
"difficulty": "beginner",
"estimated_time": "10 minutes",
"created": "2026-06-06",
"updated": "2026-06-06"
}
Loading
Loading