Problem
The segment format is currently discoverable by reading code, but researchers need concise documentation explaining what the files are, how to verify them, and how to convert them into analysis-friendly formats.
Why this matters
Academic access should not depend on source-code spelunking. Clear documentation will reduce support load and help researchers cite and reproduce archive snapshots correctly.
Suggested documentation
Add a document such as docs/segment_archive_format.md or docs/research_access.md covering:
- filename convention:
segment-000000000.notepack(.gz)
- gzip wrapping behavior
- record framing:
[u32 little-endian length][notepack bytes]
- notepack event payload summary
- archive ordering by ingestion time, not
created_at
- dedupe semantics by event id
- absence of relay provenance in canonical segments
- recommended tools: JSONL export, inspect, verify, manifest
- integrity checks with SHA-256 manifests
- example commands
Acceptance criteria
- Docs describe the segment file format precisely.
- Docs explain what metadata is not included.
- Docs include example conversion and verification commands.
- Docs explain how manifests should be used.
- Docs are linked from the README or another discoverable docs index.
just precommit passes before merging.
Problem
The segment format is currently discoverable by reading code, but researchers need concise documentation explaining what the files are, how to verify them, and how to convert them into analysis-friendly formats.
Why this matters
Academic access should not depend on source-code spelunking. Clear documentation will reduce support load and help researchers cite and reproduce archive snapshots correctly.
Suggested documentation
Add a document such as
docs/segment_archive_format.mdordocs/research_access.mdcovering:segment-000000000.notepack(.gz)[u32 little-endian length][notepack bytes]created_atAcceptance criteria
just precommitpasses before merging.