Skip to content

Relay updates and security/integrity fixes from full-repo review#22

Merged
erskingardner merged 11 commits into
masterfrom
claude/frosty-volhard-203410
May 21, 2026
Merged

Relay updates and security/integrity fixes from full-repo review#22
erskingardner merged 11 commits into
masterfrom
claude/frosty-volhard-203410

Conversation

@erskingardner
Copy link
Copy Markdown
Contributor

Summary

This branch started with default relay-list maintenance, then a full-repo review (API, ingester, negentropy) turned up several bugs, security holes, and performance problems. It fixes all Critical/High/Medium findings, grouped into reviewable commits (each passing just precommit).

What changed, by area

Relay defaults

  • Add relay.divine.video, relay.momostr.pink, relay.ditto.pub to the default seed list; drop the dead relay.nostr.band. Add relay.divine.video + nos.lol to the negentropy (NIP-77) defaults. (d6cf2d0)

Web / HTTP tier (pensieve-preview, pensieve-serve)

  • Stored XSS fixed two ways: strip raw/inline HTML from kind-30023 article markdown; move all inline JS to a same-origin /app.js (event delegation + data- attrs) and lock CSP to script-src 'self', removing the image-grid onclick injection sink. (6201a14, cdeb097)
  • SSRF guard on the OG-image avatar fetch: scheme check, reject non-global resolved IPs (loopback/private/link-local/metadata/CGNAT/IPv6 ULA), disable redirects. (6201a14)
  • Constant-time API-token comparison; stop leaking ClickHouse error text to clients. (6201a14)
  • Fix N+1 profile lookups (single batched IN query + 50 cap) and add a 10s request TimeoutLayer to both servers. (52e7f58)

Ingest data integrity (pensieve-ingest)

  • Crash-safe archive writes (was: silent permanent event loss on non-graceful crash): in-flight dedupe markers are now in-memory only, segments are fsync'd and mark_archived runs on every seal, and negentropy only advances sync-state for durably-archived events. Legacy on-disk markers still read as "seen" (no migration/re-fetch storm). (f229b5d)
  • Guard seal divide-by-zero, bound segment frame length on read (torn-segment safety), and stop dropping live/negentropy events on transient dedupe errors. (46ed602)

ClickHouse indexing

  • Bounded retry + a persistent re-index queue so the derived index self-heals from the archive; batched inserts (uses batch_size); replaced an unsafe raw-pointer atomic with Arc<AtomicUsize>. (9fa60f9)

Validation ⚠️ real bug found

  • A strict regression test revealed that nostr::Event::from_json only deserializes — it does not verify — so the JSONL/protobuf backfill paths were accepting tampered (forged id/signature) events into the canonical archive. Added explicit verify_event_crypto (id + signature) on those paths plus strict rejection tests. (b78f782)

Misc

  • gitignore local reviews/ artifacts. (32dfb56)

Test plan

  • just precommit (fmt + clippy -D warnings + tests) green before every commit
  • New unit tests: dedupe crash-safety semantics; strict tampered-id/signature rejection for validate_event and validate_proto_event
  • Not yet integration-tested against live relays or a running ClickHouse — durability and re-index behavior are covered by unit tests + types, not an end-to-end crash/outage test
  • Reviewer note: production seed/negentropy lists live in server-side files that shadow the code defaults; the ClickHouse re-index queue writes to <output_dir>/.clickhouse_reindex_queue

🤖 Generated with Claude Code

erskingardner and others added 11 commits May 21, 2026 08:25
Add relay.divine.video, relay.momostr.pink, and relay.ditto.pub to the
default seed relays and drop the now-dead relay.nostr.band. Add
relay.divine.video and nos.lol to the negentropy (NIP-77) sync defaults,
and refresh the associated docs, CLI help, and tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- pensieve-preview: filter raw/inline HTML out of kind-30023 article markdown
  so attacker-controlled content can no longer inject <script> (stored XSS).
- pensieve-preview: add an SSRF guard to the OG-image avatar fetch — require
  http(s), resolve the host and reject non-globally-routable IPs (loopback,
  private, link-local/metadata, CGNAT, IPv6 ULA), and disable redirects.
- pensieve-serve: compare API tokens in constant time instead of a HashSet
  lookup to avoid leaking match progress via response timing.
- pensieve-serve: stop returning raw ClickHouse error text to clients.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move all interactivity (image lightbox, copy-to-clipboard, broken-image
hiding, home search) into a same-origin /app.js using event delegation and
data- attributes, and replace every inline event handler. This removes the
image-grid onclick sink where an event-content URL containing a single quote
could break out of the JS string (stored XSS), and lets the CSP drop
script-src 'unsafe-inline' in favor of 'self'.

Also corrects doc comments that claimed "no JavaScript execution" / "all
content escaped", which no longer matched reality.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eout

The JSON event endpoint resolved one ClickHouse profile query per p-tag
serially, so an event with thousands of p tags caused a query amplification
DoS. Resolve mentions with a single batched fetch_profiles() query, dedupe,
and cap at 50. Add a 10s request TimeoutLayer to both the preview and API
servers to bound slow/expensive requests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The live daemon only marked events `Archived` on the final shutdown seal;
mid-stream seals went to ClickHouse but never to the dedupe index, and the
`Pending` marker was written (non-durably) before the event bytes left an
8 MB in-memory buffer. A non-graceful crash therefore lost the buffered
events while their markers survived, so they were treated as "seen" forever
and never re-fetched — defeating the archive-first guarantee.

Changes:
- Dedupe `Pending` is now tracked in memory only; only durably-sealed events
  are persisted (as `Archived`). On a crash the in-memory markers vanish with
  the lost buffer, so those events are simply re-fetched and de-duplicated by
  ClickHouse's ReplacingMergeTree. Legacy on-disk `Pending` values still read
  as "seen", so no migration/re-fetch storm.
- seal() now fsyncs the segment file and marks its events `Archived` on EVERY
  seal (the writer holds a dedupe ref), not just the final one.
- Negentropy only advances sync-state for events confirmed durably `Archived`,
  so a crash can't leave sync-state claiming an unarchived event (H4).

Backfill binaries keep marking archived explicitly (writer dedupe ref = None).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rrors

- Guard the compression-ratio log against divide-by-zero on an empty/forced
  seal (it ran in a detached thread, so a panic there would have silently
  killed compression for that segment).
- Bound the segment frame length on read: a crash mid-write can leave a partial
  length prefix, so cap a single frame at 16 MB and stop reading rather than
  attempting a multi-gigabyte allocation on a torn segment.
- Stop dropping a live or negentropy event when the dedupe check itself errors
  (a transient RocksDB failure). Treat it as novel instead — a possible
  duplicate is reconciled downstream by ClickHouse, but a dropped event is lost.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously a failed segment insert was logged and dropped (with a TODO),
the configured batch_size was unused (the whole segment was inserted in one
request), and the indexer shared its counters across threads via an unsafe
raw-pointer cast.

- Retry each segment with bounded exponential backoff; on exhaustion increment
  clickhouse_insert_errors_total and append the segment path to a persistent
  reindex queue. The queue is drained on the next start, so ClickHouse
  self-heals from the canonical archive (idempotent via ReplacingMergeTree).
- Insert events in batch_size-sized chunks instead of one giant request.
- Replace the unsafe AtomicUsize pointer cast with Arc<AtomicUsize>.

The live ingester enables the reindex queue (in the segment output dir);
backfill binaries leave it disabled.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eries)

Added strict regression tests asserting that tampered-signature and tampered-id
events are rejected on every validation entry point. They revealed a real hole:
`nostr::Event::from_json` only deserializes — it does NOT verify the ID hash or
signature — so validate_event() (JSONL backfill) and validate_proto_event()
(protobuf backfill) were returning Ok for forged events, which would then be
written to the canonical archive.

Fix: add an explicit verify_event_crypto() (verify_id + verify_signature) and
call it from validate_event, validate_notebuf, and validate_proto_event. The
live and negentropy paths were already safe (nostr-relay-pool verifies before
emitting events); these tests also guard that shared invariant against future
dependency changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Keep local code-review output (e.g. reviews/review.md) out of version control.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirror `just precommit` in CI: rustfmt check, clippy (-D warnings), and the
test suite via cargo-nextest sharded across 3 runners, plus a separate doctest
job. Installs libclang for the RocksDB build and caches the cargo build. A
ci-success job aggregates the matrix for a single required status check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pensieve-core's build script uses prost-build, which requires `protoc`. Without
it every workspace build (clippy, tests, doctests) failed at the build script;
add protobuf-compiler to the apt install alongside libclang.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@erskingardner erskingardner merged commit a89cec2 into master May 21, 2026
7 checks passed
@erskingardner erskingardner deleted the claude/frosty-volhard-203410 branch May 21, 2026 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant