Problem
Pensieve segment files are the canonical archive, but they are stored as length-prefixed notepack payloads, optionally wrapped in gzip. That is efficient for storage and ingestion, but not convenient for academics or other researchers who expect line-delimited Nostr JSON events.
Why this matters
Researchers should not need to understand Pensieve internals or notepack framing just to use an archive snapshot. A JSONL export tool gives them a universal, inspectable format while preserving notepack as the canonical storage format.
Suggested implementation
Add a CLI tool, likely under crates/pensieve-ingest/src/bin/, such as segment-to-jsonl.
The tool should:
- Accept one segment file or a directory of segments.
- Read both
.notepack and .notepack.gz files.
- Decode each
[u32 little-endian length][notepack bytes] record.
- Emit one canonical Nostr JSON event per output line.
- Preserve deterministic ordering by segment filename and record order.
- Surface parse/truncation errors with file and byte offset context.
Acceptance criteria
- A
.notepack segment exports to valid JSONL.
- A
.notepack.gz segment exports to valid JSONL.
- Output lines contain valid Nostr event fields:
id, pubkey, created_at, kind, tags, content, sig.
- Directory input processes segments in lexical filename order.
- Tests cover uncompressed, gzip-compressed, and truncated segment cases.
just precommit passes before merging.
Problem
Pensieve segment files are the canonical archive, but they are stored as length-prefixed notepack payloads, optionally wrapped in gzip. That is efficient for storage and ingestion, but not convenient for academics or other researchers who expect line-delimited Nostr JSON events.
Why this matters
Researchers should not need to understand Pensieve internals or notepack framing just to use an archive snapshot. A JSONL export tool gives them a universal, inspectable format while preserving notepack as the canonical storage format.
Suggested implementation
Add a CLI tool, likely under
crates/pensieve-ingest/src/bin/, such assegment-to-jsonl.The tool should:
.notepackand.notepack.gzfiles.[u32 little-endian length][notepack bytes]record.Acceptance criteria
.notepacksegment exports to valid JSONL..notepack.gzsegment exports to valid JSONL.id,pubkey,created_at,kind,tags,content,sig.just precommitpasses before merging.