Skip to content

Latest commit

 

History

History
50 lines (48 loc) · 25.7 KB

File metadata and controls

50 lines (48 loc) · 25.7 KB

TODO

Pending

  • Immediate-1 Bootstrap the Cargo workspace with root Cargo.toml, shared lint settings, and stub crates for API, monitor, and domain logic so that future work is cleanly modularized – Tests: add smoke cargo test --all --all-features/cargo clippy --workspace --all-features CI jobs plus unit tests ensuring each crate compiles with required features – Docs: expand README.md with the workspace layout diagram and command matrix, and inline crate-level //! docs describing responsibilities – Risks/Deps: selecting crate boundaries too early could cause churn; capture assumptions in design_docs/ for review.
  • Immediate-2 Establish deterministic environment plumbing (Rust toolchain pin, .env.example, sample config/ secrets flow) and wire up sha3 + dotenvy crates across workspace to keep hashing and configuration consistent – Tests: add config parsing unit tests plus a cargo test guard verifying .env loading from fixtures – Docs: document environment variables in README tables and annotate .env.example – Risks/Deps: ensure secrets are never committed by extending .gitignore and adding a pre-commit note.
  • ShortTerm-3 Define the storage abstraction layer (PaymentStore, TokenStore, MonitorStateStore) and deliver the initial SQLite-backed adapter covering migrations and connection pooling – Tests: use sqlx::test or TempDir-backed integration tests to exercise CRUD flows and transactional guarantees – Docs: generate module-level docs explaining trait contracts and add migration walkthroughs to README – Risks/Deps: confirm sqlx offline data is checked in to keep CI deterministic.
  • ShortTerm-4 Implement the PID validation and token derivation helpers (enforcing 32-char hex + SHA3-256) as a shared domain module to guarantee all components apply identical security checks – Tests: property tests ensuring invalid PIDs are rejected plus unit tests comparing hash outputs to vectors – Docs: add /// docs with examples and update README security model section – Risks/Deps: highlight entropy requirements to avoid weak client implementations.
  • ShortTerm-5 Build the Actix POST /api/v1/redeem endpoint that uses the storage trait for atomic claim updates and emits deterministic service tokens – Tests: request-level integration tests (Actix test server) that cover success, duplicate, and invalid PID paths – Docs: extend API reference in README (request/response schemas) and include tracing annotations in code comments – Risks/Deps: ensure DB transactions roll back on network failures; add error taxonomy upfront.
  • MidTerm-6 Ship the monitor service crate that tails monero-wallet-rpc, persists qualifying transfers through the storage trait, and tracks height via MonitorStateStore without hardcoded defaults – Tests: async integration tests with mocked RPC responses plus regression tests for resume-from-height – Docs: architecture note in docs/ describing polling cadence and failure recovery – Risks/Deps: requires RPC endpoint availability; design retry/backoff strategy.
  • MidTerm-7 Introduce the caching/bloom-filter abstraction (trait-based) with an in-memory MVP to screen obvious invalid PIDs before hitting storage, paving the way for moka/Redis implementations – Tests: unit tests covering false-positive bounds and eviction plus integration tests verifying cache bypass falls back to DB – Docs: README section comparing cache strategies and inline notes on tuning false-positive rates – Risks/Deps: guard against cache becoming DoS vector by specifying quotas.
  • MidTerm-8 Provide introspection and revocation APIs (GET /api/v1/token/{token}, POST /api/v1/token/{token}/revoke) backed by the token store with auditing fields populated – Tests: HTTP integration tests covering active, revoked, and missing tokens along with auth failure permutations – Docs: document operational playbooks for revocation and add OpenAPI snippets – Risks/Deps: define auth story (mTLS or static keys) before exposing endpoints.
  • LongTerm-9 Layer in observability and abuse detection (structured tracing, metrics, abuse score escalation, dashboarding hooks) so production deployments can detect DoS or misuse early – Tests: telemetry snapshot tests plus load-test scripts to validate metrics fidelity – Docs: operations guide covering alert thresholds and tracing conventions – Risks/Deps: depends on earlier API + monitor milestones; coordinate with infra for metrics backend.
  • ShortTerm-10 Add Unix socket support for the public API listener (env-driven API_UNIX_SOCKET, auto-clean stale sockets, fallback to TCP bind) – Tests: Actix integration tests exercising socket + TCP modes – Docs: README/.env documenting the behavior and deployment tips – Risks/Deps: file permissions/cleanup errors or SELinux/AppArmor policies blocking socket creation.
  • ShortTerm-11 Introduce an internal-only API listener (dedicated port or Unix socket) for admin/monitoring routes so Tor-exposed endpoints stay minimal – Tests: ensure internal routes reject external traffic and cover dual-listener wiring – Docs: configuration guidance describing how to bind internal interfaces and enforce permissions – Risks/Deps: added config complexity; need to clearly separate auth for internal vs public endpoints.
  • ShortTerm-12 Refactor anon_ticket_domain into cohesive modules (config.rs, model/, services/, storage/traits.rs) so binaries import narrowly-scoped APIs and doc comments reflect the new boundaries – Tests: rerun existing unit/property suites plus add module-specific doctests documenting the new paths – Docs: update crate-level //! docs and README “Workspace Layout” to show internal submodules – Risks/Deps: requires careful move semantics to avoid breaking downstream imports; coordinate with open PRs touching domain helpers.
  • ShortTerm-13 Split the API crate into handlers/, state.rs, and application.rs so HTTP wiring, request handling, and bootstrap logic can evolve independently – Tests: keep current Actix integration tests passing and add handler-level unit tests using App::new() scaffolds – Docs: extend README “Redemption API” to mention new module boundaries and note how to embed the server in other binaries – Risks/Deps: heavy file moves may invalidate pending branches; schedule work during low churn windows.
  • MidTerm-14 Decompose the monitor crate into RPC client, ingestion pipeline, and worker loop modules while introducing a trait-based transfer source for easier simulation – Tests: add mocked transfer-source tests plus regression coverage for height advancement/backoff; keep long-running tokio::test gated – Docs: produce a short architecture note (and README summary) describing the data flow – Risks/Deps: more traits mean stricter lifetime/Send bounds; ensure reqwest client reuse stays efficient.
  • MidTerm-15 Break anon_ticket_storage into per-trait impl files (payment_store.rs, token_store.rs, monitor_state_store.rs, migration.rs) and add a thin builder for injecting caching/sharding later – Tests: rerun sqlite/postgres integration tests and add targeted unit tests for the builder defaults – Docs: update README “Storage Layer” to explain the builder plus migration split – Risks/Deps: SeaORM entity paths change, so regenerate docs/tests referencing old modules.
  • ShortTerm-16 Harden domain primitives: remove AbuseTracker, adopt moka for caching, and enforce 32-byte PIDs – Context: Audit found memory leaks in abuse tracking (redundant with negative cache) and lock contention in the handwritten PID cache. Actions: 1) Delete AbuseTracker to simplify logic and rely on negative caching. 2) Replace InMemoryPidCache with moka for automatic TTL and lock-free concurrency. 3) Bump PID_LENGTH to 64 hex chars (32 bytes) for maximum entropy. 4) Add .trim() to required env var parsing. – Tests: Property tests for 64-char PIDs, concurrency tests for cache under load. – Risks/Deps: Breaking change for clients expecting 32-char PIDs; requires moka dependency.
  • ShortTerm-17 Secure PaymentId construction and introduce random generation – Context: PaymentId::new and From<&str> allow bypassing validation, violating the type-driven security contract. Actions: 1) Make PaymentId::new private or restricted (pub(crate)). 2) Remove From<&str> to prevent infallible conversion from untrusted strings. 3) Implement TryFrom<String> for validated parsing. 4) Add PaymentId::generate() using getrandom (gated for Wasm support) to support client-side creation of high-entropy IDs. – Tests: Unit tests confirming new is inaccessible publicly (compile-fail) and generate produces valid 64-char hex strings. – Risks/Deps: Breaking change for all downstream crates instantiating PIDs; requires updating all tests to use parse or generate.
  • ShortTerm-18 Polish domain internals: add hash separators and verify Wasm compat – Context: derive_service_token concatenates inputs without separators (theoretical canonicalization risk if lengths vary in future), and getrandom needs explicit feature gating for Wasm targets. Actions: 1) Insert a separator byte (e.g., |) between PID and TXID in derive_service_token. 2) Ensure getrandom dependency in Cargo.toml (or workspace) enables the js feature for wasm32 targets to prevent build failures. – Tests: Update derive_service_token unit tests to reflect new hash values; verify cargo build --target wasm32-unknown-unknown passes (if environment permits) or check feature tree. – Risks/Deps: Changes derived token values (breaking for existing DB records if any); requires Wasm toolchain for verification.
  • ShortTerm-19 Purge dotenvy dependency in favor of shell-native config – Context: Hardcoding .env file loading inside the binary is an anti-pattern for production "monolithic fortresses" where env vars are injected by systemd/docker. It adds unnecessary file I/O logic. Actions: 1) Remove dotenvy from workspace dependencies. 2) Delete hydrate_env_file from domain::config. 3) Update load_from_env methods to rely strictly on std::env::var. 4) Document direnv or source .env workflows for local dev in README. – Tests: Verify binaries still boot when env vars are set externally; verify build size reduction (minor). – Risks/Deps: Breaks cargo run for devs who rely solely on implicit .env loading; requires doc update.
  • ShortTerm-20 Harden storage configuration and implementation – Context: Audit revealed SQLite is running in default mode (poor concurrency) and claim_payment performs redundant lookups. Actions: 1) In SeaOrmStorage::connect, detect SQLite backend and force PRAGMA journal_mode=WAL; + PRAGMA synchronous=NORMAL;. 2) Update migration.rs to explicitly set string_len(64) for PID and Token columns. 3) Optimize claim_payment by replacing update_many + find with raw SQL UPDATE ... RETURNING * via SeaOrm::execute/query_one to eliminate the second round-trip and lock contention. – Tests: Integration tests verifying WAL mode active and claim_payment correctness/atomicity. – Risks/Deps: Raw SQL bypasses some SeaORM safeguards; relies on SQLite >= 3.35.0 (standard in modern environments).
  • ShortTerm-21 Refactor internal types to binary ([u8; 32]) and storage to BLOBs – Context: Using String (Hex) to represent PIDs and Tokens wastes 2x memory/storage and CPU cycles. Switching to raw bytes aligns with the "Single-Node Fortress" strategy for maximum density and speed. Actions: 1) Change PaymentId and ServiceToken internals from String to [u8; 32]. 2) Update domain serialization to handle Hex encoding/decoding at the API boundary (Serde). 3) Update storage migrations to use BLOB/BYTEA instead of VARCHAR. 4) Update storage mapping logic to read/write bytes directly. – Tests: Verify JSON API still accepts/returns Hex strings; verify DB stores raw bytes (inspect sqlite file size); verify hash derivation remains consistent. – Risks/Deps: Breaking schema change (incompatible with existing String-based DBs); pervasive refactor across all crates.
  • ShortTerm-22 Optimize PaymentStatus column to TINYINTContext: Storing "claimed"/"unclaimed" as VARCHAR(16) wastes space (~7-9 bytes vs 1 byte) and IO bandwidth. "Single-Node Fortress" philosophy prioritizes efficiency over raw DB readability. Actions: 1) Update migration.rs to define status as tiny_integer. 2) Update entity.rs to map PaymentStatusDb enum to integers (0=Unclaimed, 1=Claimed). 3) Verify claim_payment raw SQL uses integer literals. – Tests: Verify schema change via migration tests; verify status transitions persist correctly. – Risks/Deps: Breaking schema change; debugging raw DB requires knowing the enum mapping (0/1).
  • ShortTerm-23 Harden monitor worker against transient failures – Context: run_monitor crashes the process if handle_batch returns a storage error. A fortress service should retry on IO failures. Actions: 1) In worker.rs, catch errors from handle_batch inside the loop. 2) Log them as warnings and trigger the sleep/backoff. 3) Only exit on fatal configuration errors. 4) Refactor process_entry to remove redundant string validation now that PaymentId::parse handles it. – Tests: Add a test case where the storage mock fails once then succeeds; ensure loop continues. – Risks/Deps: None; pure reliability fix.
  • ShortTerm-24 Filter dust transactions to prevent DB exhaustion – Context: Currently, the monitor persists any incoming transaction with a valid PID, regardless of amount. An attacker could flood the blockchain with "dust" (1 piconero) transactions, filling the SQLite database with garbage records at negligible cost (DoS via resource exhaustion). Actions: 1) Add MONITOR_MIN_PAYMENT_AMOUNT to BootstrapConfig (default e.g., 1_000_000 atomic units). 2) Update process_entry in monitor/pipeline.rs to check entry.amount < min_amount. 3) If below threshold, log a warning and skip persistence (return Ok(false)). – Tests: Unit test process_entry with amounts below and above the threshold. – Risks/Deps: Legitimate underpayments are discarded (acceptable trade-off for security).
  • ShortTerm-25 Migrate to encrypted 64-bit Payment IDs to prevent front-running – Context: A critical security review revealed that legacy 32-byte Payment IDs are visible in cleartext on the blockchain. Attackers can scan the mempool, extract these IDs, and race to redeem them before legitimate users (front-running). Actions: 1) Refactor PaymentId to wrap [u8; 8] (encrypted compact ID) instead of [u8; 32]. 2) Update storage schema to use BLOB(8). 3) Ensure monitor decodes Integrated Addresses via RPC/monero-rs. 4) Publish a security analysis proving that 64-bit entropy is sufficient against brute-force attacks even without IP rate limiting. 5) Maintain the "client-generates-ID" workflow but require clients to construct Integrated Addresses locally. – Tests: Verify collision resistance logic and correct integrated address decoding. – Risks/Deps: Breaking change; requires clients to support Monero address encoding.
  • ShortTerm-26 Adopt the monero crate for canonical address/PID handling and integrated address assembly with high-entropy Payment IDs – Context: The monitor currently hand-rolls Monero RPC types; we need battle-tested primitives and centralized PID generation to avoid collisions. Actions: 1) Introduce monero (monero-rs) across domain/monitor for parsing/validating primary + integrated addresses and transaction IDs; delete bespoke RPC structs. 2) Add a domain-level IntegratedAddressBuilder that accepts a validated primary address and a high-entropy PaymentId, returning the integrated address string for both client and monitor use. 3) Ensure PaymentId::generate uses rand_core::OsRng/getrandom with at least 64 bits of entropy and document the collision budget. – Tests: Round-trip encode/decode integrated addresses; property tests showing sampled PIDs yield unique outputs; regression tests for RPC decoding paths using monero types. – Risks/Deps: Pulls in monero/curve25519-dalek (larger binaries); must confirm license compatibility and disable any std-only features that would block WASM.
  • ShortTerm-27 Make integrated-address generation WASM-safe with explicit getrandom gating – Context: Clients will compile the builder to wasm32-unknown-unknown; missing getrandom features or std-bound code will break builds. Actions: 1) Added a domain-only wasm feature enabling getrandom/js for wasm32. 2) Documented the build check cargo build -p anon_ticket_domain --target wasm32-unknown-unknown --features wasm. 3) Documented consumer guidance for wasm-bindgen/wasm-pack via string-based integrated-address helpers. – Tests: Build-only guidance recorded (no runtime changes). – Risks/Deps: Browser randomness still depends on crypto.getRandomValues; consumers must pass --features wasm when targeting wasm32.
  • ShortTerm-28 Enhance monitor configurability and observability – Context: The monitor's poll interval is hardcoded (5s), and its metrics are undocumented. We need to let operators tune the latency/load trade-off and provide clear guidance on tracking sync progress via Prometheus (rejecting ad-hoc status APIs to maintain "fortress" simplicity). Actions: 1) Add MONITOR_POLL_INTERVAL_SECS to BootstrapConfig (default 5). 2) Update run_monitor in worker.rs to use this dynamic interval. 3) Update crates/monitor/README.md with a "Metrics & Observability" section detailing monitor_last_height, monitor_rpc_calls_total, etc. – Tests: Unit test config loading; verify loop respects interval (mock clock). – Risks/Deps: None.
  • ShortTerm-29 Secure API revocation endpoint – Context: The POST /api/v1/token/{token}/revoke endpoint was exposed on the public listener, letting any caller revoke tokens or bump abuse scores. Actions: 1) Move the revoke route to the internal listener in application.rs. 2) Add tests proving public listeners return 404 while internal succeeds. 3) Document the internal-only route and mark TODO done. – Tests: Integration test for public 404/internal 200. – Risks/Deps: Admins must configure the internal listener to perform revocations.
  • ShortTerm-29 Secure API revocation endpoint – Context: The POST /api/v1/token/{token}/revoke endpoint is currently exposed on the public listener without authentication, allowing any token holder (or brute-forcer) to revoke tokens or manipulate abuse scores. This is an administrative action and must be restricted. Actions: 1) In application.rs, move the revoke route from public_server to internal_server. 2) Ensure the internal server logic correctly handles this new route. – Tests: Integration test verifying 404 on public port and 200 on internal port for revocation. – Risks/Deps: Admins must configure the internal listener to perform revocations.
  • ShortTerm-30 Monitor Confirmation Safety – Context: Currently, the monitor processes transactions as soon as they appear (even with 0 confirmations). If the blockchain forks (reorg), these transactions might become invalid, but we might have already issued tokens. Actions: 1) Add MONITOR_MIN_CONFIRMATIONS to BootstrapConfig (default 10). 2) Refactor run_monitor loop to calculate safe_height = wallet_height - min_confirmations. 3) Only fetch/process transfers where height <= safe_height. 4) Only advance last_processed_height up to safe_height. – Tests: Simulate immature transactions being ignored until chain height advances. – Risks/Deps: Increases user wait time (latency).
  • ShortTerm-31 Make PID negative-cache grace configurable – Context: PID_CACHE_NEGATIVE_GRACE is hardcoded to 500ms in the API handler. Actions: add API_PID_CACHE_NEGATIVE_GRACE_MS (bounded >0) and surface it in ApiConfig; ensure handlers read the configured value and validate grace <= ttl. Tests: Actix handler tests covering short-circuit window; config parsing happy/edge paths. Risks/Deps: too-small grace may raise DB load; too-large grace can block fresh payments briefly.
  • ShortTerm-32 Expose PID cache TTL/capacity knobs – Context: InMemoryPidCache currently uses 60s TTL and capacity 100k. Actions: add API_PID_CACHE_TTL_SECS and API_PID_CACHE_CAPACITY env vars; apply to both positive/negative caches; enforce ttl >= grace; document sizing guidance. Tests: config parsing + cache behavior with custom TTL/capacity. Risks/Deps: low TTL collapses hit rate; excessive capacity increases memory footprint.
  • MidTerm-33 Introduce Bloom filter layer for PID screening – Context: need scalable negative/positive hints with no false negatives (false positives acceptable). Actions: add Bloom filter implementation or crate-backed adapter, fed from cache/storage; wire into redeem path before DB lookups; expose tuning knobs (e.g., false-positive rate, refresh cadence). Tests: property tests proving zero false negatives and bounded FP rate; integration tests showing DB load reduction under spray traffic. Risks/Deps: requires periodic rebuild to bound FP; configuration must prevent mis-sizing on small deployments.
  • ShortTerm-34 Co-locate monitor ingestion to seed Bloom/cache in-process – Embedded the monitor inside the API bootstrap (with a dev escape API_ALLOW_NO_MONITOR), prewarming Bloom/cache from storage and sharing hooks so new payments immediately update the filter; standalone monitor binaries now require ALLOW_STANDALONE_MONITOR=1 to avoid production split deployments.
  • ShortTerm-35 Enforce mandatory Bloom guard at startup – Bloom is now required unless API_ALLOW_NO_BLOOM=1; startup logs chosen entries/FPR and estimated bitset bytes, and Bloom config errors fail fast to keep the DoS shield intact.
  • ShortTerm-36 Redeem path Bloom-only screening (no negative cache writes) – Negative cache removed; Bloom negatives 404 immediately, and only confirmed storage hits mark Bloom/cache. Absent lookups never write Bloom, and Bloom-positive/DB-miss cases are counted for FP monitoring.
  • ShortTerm-37 Surface Bloom+moka sizing controls and telemetry – Added Bloom sizing log (estimated bytes), cache remains positive-only with existing TTL/capacity knobs, and new metric api_redeem_bloom_db_miss_total tracks Bloom FP drift alongside hint counters.
  • ShortTerm-38 Document Bloom-only DoS posture and operational playbook – README/API/Design docs now describe the Bloom-only defense, removal of negative cache, dev escape hatches, and the expected behavior under Tor/no-IP-limit deployments.
  • ShortTerm-39 Enforce mandatory internal listener and drop public metrics fallback – Context: simplify startup paths and guarantee admin/metrics exposure only via an internal endpoint. Actions: 1) Make API_INTERNAL_BIND_ADDRESS or API_INTERNAL_UNIX_SOCKET mandatory (fail fast if both missing). 2) Remove include_metrics_on_public and related conditional wiring; public listener serves only user-facing routes. 3) Update bootstrap errors/docs to reflect the requirement. Tests: config parsing (missing/one/both) and integration ensuring boot fails without an internal listener and metrics/internal routes are unreachable on the public port. Risks/Deps: deployments must provision loopback or Unix socket; Tor-only setups need explicit internal binding.
  • ShortTerm-40 Simplify platform-specific listener wiring with cfg_if – Context: scattered #[cfg(unix)]/#[cfg(not(unix))] blocks reduce readability. Actions: consolidate listener/socket setup behind cfg_if!, pruning dead branches exposed by the mandatory-internal change. Tests: build on Unix targets plus a non-Unix compile check; reuse existing API integration tests. Risks/Deps: minimal (small macro dependency); ensure behavior parity on non-Unix platforms.
  • ShortTerm-41 Rebaseline Bloom sizing and monitoring for 64-bit PIDs – Context: FPR depends on expected unique PID count, not entropy. Actions: 1) Add sizing guidance (entries, FPR, k) for typical volumes (1e6–1e8) and memory footprints; 2) Define alerting guidance on api_redeem_bloom_db_miss_total without relying on Bloom rebuilds; 3) Provide default preset suggestions for small vs. large deployments. Tests/Docs: update API README/DESIGN and ops notes; code changes only if defaults are adjusted. Risks: mis-sizing could degrade to DB-only path.
  • ShortTerm-42 Raise dust floor default in samples to bound DoS cost – Context: attackers could pay minimal amounts to bloat Bloom/cache/storage. Actions: pick a higher MONITOR_MIN_PAYMENT_AMOUNT default in .env.example (document rationale vs. memory/bandwidth), and add guidance on tuning it to match Bloom capacity planning. Tests/Docs: update README/monitor docs and config tables; ensure examples stay consistent. Risks: legitimate micro-payments might be rejected; needs clear operator guidance.
  • ShortTerm-43 Make TokenStatusResponse.status an enum and bump API contract – Context: developer-facing introspection currently returns strings ("active"/"revoked"). Actions: switch to an enum in the response schema, update API docs/tests/clients, and record the breaking change in CHANGELOG. Risks: incompatible with existing consumers; may require version negotiation if any external clients exist.

Plan Summary

Foundational scaffolding, storage correctness, and API/monitor surfaces are in place; recent work hardened the domain layer. Next, we’ll keep pushing high-concurrency SQLite optimizations (WAL mode, atomic RETURNING queries, binary core types).