Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
173 changes: 173 additions & 0 deletions schemas/ingestion-jobs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
# Ingestion Jobs

> Track dry-run import jobs, extracted items, review status, and execution counts.

## What It Does

This schema adds `public.ingestion_jobs` and `public.ingestion_items` for import workflows that need a reviewable dry run before writing new thoughts. It also adds helper RPCs for recounting job results and appending concise evidence to existing thoughts.

The schema stores extracted thought candidates and source metadata, not full raw transcripts. Items default to `review_status = 'unreviewed'` so inferred or generated memory remains evidence-grade until reviewed.

## Prerequisites

- Working Open Brain setup ([guide](../../docs/01-getting-started.md))
- Supabase project with SQL Editor access
- Existing `public.thoughts` table
- Service-role access for backend import workers

## Credential Tracker

```text
INGESTION JOBS -- CREDENTIAL TRACKER
--------------------------------------

FROM YOUR OPEN BRAIN SETUP
Supabase Project URL: ____________
Supabase Service Role Key: ____________

SETUP
SQL migration applied: yes / no
Test job cleaned up: yes / no

--------------------------------------
```

## Steps

1. Open your Supabase project.

2. Go to SQL Editor and create a new query.

3. Copy and run [`schema.sql`](./schema.sql).

4. Verify the tables exist:

```sql
select table_name
from information_schema.tables
where table_schema = 'public'
and table_name in ('ingestion_jobs', 'ingestion_items');
```

5. Verify the helper functions exist:

```sql
select routine_name
from information_schema.routines
where routine_schema = 'public'
and routine_name in ('recount_ingestion_job', 'append_thought_evidence');
```

6. Create a dry-run test job:

```sql
insert into public.ingestion_jobs (
source_type,
source_label,
input_hash,
input_bytes,
dry_run,
status
)
values (
'manual-test',
'Manual schema test',
'sha256:test-ingestion-job',
42,
true,
'pending'
)
returning id;
```

7. Add one reviewable item using the returned job ID:

```sql
insert into public.ingestion_items (
job_id,
sequence,
extracted_content,
content_fingerprint,
action,
status,
reason,
review_status
)
values (
'<job-id>',
1,
'Use dry-run import review before writing migrated records.',
'sha256:test-item',
'add',
'ready',
'manual_schema_test',
'unreviewed'
);
```

8. Clean up the test job:

```sql
delete from public.ingestion_jobs
where input_hash = 'sha256:test-ingestion-job';
```

## Expected Outcome

After setup, your database has:

- `public.ingestion_jobs`
- `public.ingestion_items`
- `public.recount_ingestion_job(uuid)`
- `public.append_thought_evidence(uuid, jsonb)`
- RLS enabled on both tables
- Service-role grants for backend import workers

The normal lifecycle is:

```text
pending -> converting -> validating -> extracting -> reconciling
-> dry_run_complete -> approved -> executing -> complete
```

Failed or cancelled jobs use `failed` or `cancelled`.

## Lifecycle Notes

`ingestion_jobs.dry_run` defaults to `true`. A backend can create jobs and items, reconcile each item to an action, then stop at `dry_run_complete` for human review. Execution happens later after approved items are marked ready.

`ingestion_items.action` describes what should happen:

| Action | Meaning |
| ------ | ------- |
| `add` | Create a new thought. |
| `skip` | Do not write because the item is duplicate or low value. |
| `append_evidence` | Add concise evidence to an existing thought. |
| `create_revision` | Create a revised thought derived from an existing one. |

`review_status` is separate from execution status. New items are `unreviewed` by default.

## What This Does Not Do

- It does not import records by itself.
- It does not generate embeddings.
- It does not call an LLM.
- It does not expose an Edge Function.
- It does not include dashboard screens.

Those behaviors belong to later ingestion PRs.

## Troubleshooting

**Issue: `relation "public.thoughts" does not exist`**
Solution: Complete the Open Brain setup first. The helper evidence RPC expects the core `thoughts` table.

**Issue: duplicate input hash**
Solution: The same source hash already has a job. Query `public.ingestion_jobs` by `source_type` and `input_hash` to inspect the existing dry run.

**Issue: evidence excerpt is too long**
Solution: Store a concise excerpt. Do not append raw transcripts or large source documents as evidence.

## Related

This schema supports the Open Brain workflow from Nate B. Jones. Nate shares practical systems at [Nate's Newsletter](https://substack.com/@natesnewsletter) and [natebjones.com](https://natebjones.com).
21 changes: 21 additions & 0 deletions schemas/ingestion-jobs/metadata.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
"name": "Ingestion Jobs",
"description": "Schema for tracking dry-run import jobs, extracted items, reconciliation decisions, and reviewed execution status.",
"category": "schemas",
"author": {
"name": "Alan Shurafa",
"github": "alanshurafa"
},
"version": "1.0.0",
"requires": {
"open_brain": true,
"services": ["Supabase"],
"tools": ["Supabase SQL Editor"]
},
"requires_skills": [],
"tags": ["ingestion", "dry-run", "jobs", "metadata", "review"],
"difficulty": "beginner",
"estimated_time": "10 minutes",
"created": "2026-06-06",
"updated": "2026-06-06"
}
Loading
Loading