NateBJones-Projects · alanshurafa · Jun 6, 2026
diff --git a/schemas/ingestion-jobs/README.md b/schemas/ingestion-jobs/README.md
@@ -0,0 +1,173 @@
+# Ingestion Jobs
+
+> Track dry-run import jobs, extracted items, review status, and execution counts.
+
+## What It Does
+
+This schema adds `public.ingestion_jobs` and `public.ingestion_items` for import workflows that need a reviewable dry run before writing new thoughts. It also adds helper RPCs for recounting job results and appending concise evidence to existing thoughts.
+
+The schema stores extracted thought candidates and source metadata, not full raw transcripts. Items default to `review_status = 'unreviewed'` so inferred or generated memory remains evidence-grade until reviewed.
+
+## Prerequisites
+
+- Working Open Brain setup ([guide](../../docs/01-getting-started.md))
+- Supabase project with SQL Editor access
+- Existing `public.thoughts` table
+- Service-role access for backend import workers
+
+## Credential Tracker
+
+```text
+INGESTION JOBS -- CREDENTIAL TRACKER
+--------------------------------------
+
+FROM YOUR OPEN BRAIN SETUP
+  Supabase Project URL:      ____________
+  Supabase Service Role Key: ____________
+
+SETUP
+  SQL migration applied:     yes / no
+  Test job cleaned up:       yes / no
+
+--------------------------------------
+```
+
+## Steps
+
+1. Open your Supabase project.
+
+2. Go to SQL Editor and create a new query.
+
+3. Copy and run [`schema.sql`](./schema.sql).
+
+4. Verify the tables exist:
+
+   ```sql
+   select table_name
+   from information_schema.tables
+   where table_schema = 'public'
+     and table_name in ('ingestion_jobs', 'ingestion_items');
+   ```
+
+5. Verify the helper functions exist:
+
+   ```sql
+   select routine_name
+   from information_schema.routines
+   where routine_schema = 'public'
+     and routine_name in ('recount_ingestion_job', 'append_thought_evidence');
+   ```
+
+6. Create a dry-run test job:
+
+   ```sql
+   insert into public.ingestion_jobs (
+     source_type,
+     source_label,
+     input_hash,
+     input_bytes,
+     dry_run,
+     status
+   )
+   values (
+     'manual-test',
+     'Manual schema test',
+     'sha256:test-ingestion-job',
+     42,
+     true,
+     'pending'
+   )
+   returning id;
+   ```
+
+7. Add one reviewable item using the returned job ID:
+
+   ```sql
+   insert into public.ingestion_items (
+     job_id,
+     sequence,
+     extracted_content,
+     content_fingerprint,
+     action,
+     status,
+     reason,
+     review_status
+   )
+   values (
+     '<job-id>',
+     1,
+     'Use dry-run import review before writing migrated records.',
+     'sha256:test-item',
+     'add',
+     'ready',
+     'manual_schema_test',
+     'unreviewed'
+   );
+   ```
+
+8. Clean up the test job:
+
+   ```sql
+   delete from public.ingestion_jobs
+   where input_hash = 'sha256:test-ingestion-job';
+   ```
+
+## Expected Outcome
+
+After setup, your database has:
+
+- `public.ingestion_jobs`
+- `public.ingestion_items`
+- `public.recount_ingestion_job(uuid)`
+- `public.append_thought_evidence(uuid, jsonb)`
+- RLS enabled on both tables
+- Service-role grants for backend import workers
+
+The normal lifecycle is:
+
+```text
+pending -> converting -> validating -> extracting -> reconciling
+        -> dry_run_complete -> approved -> executing -> complete
+```
+
+Failed or cancelled jobs use `failed` or `cancelled`.
+
+## Lifecycle Notes
+
+`ingestion_jobs.dry_run` defaults to `true`. A backend can create jobs and items, reconcile each item to an action, then stop at `dry_run_complete` for human review. Execution happens later after approved items are marked ready.
+
+`ingestion_items.action` describes what should happen:
+
+| Action | Meaning |
+| ------ | ------- |
+| `add` | Create a new thought. |
+| `skip` | Do not write because the item is duplicate or low value. |
+| `append_evidence` | Add concise evidence to an existing thought. |
+| `create_revision` | Create a revised thought derived from an existing one. |
+
+`review_status` is separate from execution status. New items are `unreviewed` by default.
+
+## What This Does Not Do
+
+- It does not import records by itself.
+- It does not generate embeddings.
+- It does not call an LLM.
+- It does not expose an Edge Function.
+- It does not include dashboard screens.
+
+Those behaviors belong to later ingestion PRs.
+
+## Troubleshooting
+
+**Issue: `relation "public.thoughts" does not exist`**
+Solution: Complete the Open Brain setup first. The helper evidence RPC expects the core `thoughts` table.
+
+**Issue: duplicate input hash**
+Solution: The same source hash already has a job. Query `public.ingestion_jobs` by `source_type` and `input_hash` to inspect the existing dry run.
+
+**Issue: evidence excerpt is too long**
+Solution: Store a concise excerpt. Do not append raw transcripts or large source documents as evidence.
+
+## Related
+
+This schema supports the Open Brain workflow from Nate B. Jones. Nate shares practical systems at [Nate's Newsletter](https://substack.com/@natesnewsletter) and [natebjones.com](https://natebjones.com).
diff --git a/schemas/ingestion-jobs/metadata.json b/schemas/ingestion-jobs/metadata.json
@@ -0,0 +1,21 @@
+{
+  "name": "Ingestion Jobs",
+  "description": "Schema for tracking dry-run import jobs, extracted items, reconciliation decisions, and reviewed execution status.",
+  "category": "schemas",
+  "author": {
+    "name": "Alan Shurafa",
+    "github": "alanshurafa"
+  },
+  "version": "1.0.0",
+  "requires": {
+    "open_brain": true,
+    "services": ["Supabase"],
+    "tools": ["Supabase SQL Editor"]
+  },
+  "requires_skills": [],
+  "tags": ["ingestion", "dry-run", "jobs", "metadata", "review"],
+  "difficulty": "beginner",
+  "estimated_time": "10 minutes",
+  "created": "2026-06-06",
+  "updated": "2026-06-06"
+}