Skip to content

Add release metadata for archive provenance and coverage #20

@erskingardner

Description

@erskingardner

Problem

Canonical segment files intentionally store only Nostr events. They do not include ingest timestamp, relay provenance, source coverage, relay set, collection window, or software version metadata.

That is a good property for the archive format, but research releases still need sidecar metadata so academics can understand dataset provenance and limitations.

Why this matters

Researchers need to know what a snapshot represents: when it was collected, which relay strategy produced it, what gaps or biases may exist, and what software version generated the archive. Without this, analyses are harder to reproduce and easier to misinterpret.

Suggested implementation

Create a release-level metadata file, likely emitted alongside manifests.

Candidate fields:

  • dataset/release id
  • generated_at
  • collection_start and collection_end, if known
  • segment range included
  • source mode: live relay, negentropy, JSONL backfill, protobuf backfill, or mixed
  • seed relay list or relay database summary
  • max relay count and discovery settings
  • dedupe policy
  • validation policy
  • Pensieve git revision
  • notepack git revision
  • known limitations

This should be sidecar metadata, not embedded into canonical event records.

Acceptance criteria

  • Research release metadata can be generated for a segment batch.
  • Metadata includes software revisions and generation timestamp.
  • Metadata describes collection/source mode and known limitations.
  • Metadata does not alter canonical segment records.
  • Docs explain how to interpret the metadata.
  • just precommit passes before merging.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions