- Immediate-1 Bootstrap the Cargo workspace with root
Cargo.toml, shared lint settings, and stub crates for API, monitor, and domain logic so that future work is cleanly modularized – Tests: add smokecargo test --all --all-features/cargo clippy --workspace --all-featuresCI jobs plus unit tests ensuring each crate compiles with required features – Docs: expandREADME.mdwith the workspace layout diagram and command matrix, and inline crate-level//!docs describing responsibilities – Risks/Deps: selecting crate boundaries too early could cause churn; capture assumptions indesign_docs/for review. - Immediate-2 Establish deterministic environment plumbing (Rust toolchain pin,
.env.example, sampleconfig/secrets flow) and wire upsha3+dotenvycrates across workspace to keep hashing and configuration consistent – Tests: add config parsing unit tests plus acargo testguard verifying.envloading from fixtures – Docs: document environment variables in README tables and annotate.env.example– Risks/Deps: ensure secrets are never committed by extending.gitignoreand adding a pre-commit note. - ShortTerm-3 Define the storage abstraction layer (
PaymentStore,TokenStore,MonitorStateStore) and deliver the initial SQLite-backed adapter covering migrations and connection pooling – Tests: usesqlx::testor TempDir-backed integration tests to exercise CRUD flows and transactional guarantees – Docs: generate module-level docs explaining trait contracts and add migration walkthroughs to README – Risks/Deps: confirmsqlxoffline data is checked in to keep CI deterministic. - ShortTerm-4 Implement the PID validation and token derivation helpers (enforcing 32-char hex + SHA3-256) as a shared domain module to guarantee all components apply identical security checks – Tests: property tests ensuring invalid PIDs are rejected plus unit tests comparing hash outputs to vectors – Docs: add
///docs with examples and update README security model section – Risks/Deps: highlight entropy requirements to avoid weak client implementations. - ShortTerm-5 Build the Actix
POST /api/v1/redeemendpoint that uses the storage trait for atomic claim updates and emits deterministic service tokens – Tests: request-level integration tests (Actix test server) that cover success, duplicate, and invalid PID paths – Docs: extend API reference in README (request/response schemas) and include tracing annotations in code comments – Risks/Deps: ensure DB transactions roll back on network failures; add error taxonomy upfront. - MidTerm-6 Ship the monitor service crate that tails
monero-wallet-rpc, persists qualifying transfers through the storage trait, and tracks height viaMonitorStateStorewithout hardcoded defaults – Tests: async integration tests with mocked RPC responses plus regression tests for resume-from-height – Docs: architecture note indocs/describing polling cadence and failure recovery – Risks/Deps: requires RPC endpoint availability; design retry/backoff strategy. - MidTerm-7 Introduce the caching/bloom-filter abstraction (trait-based) with an in-memory MVP to screen obvious invalid PIDs before hitting storage, paving the way for moka/Redis implementations – Tests: unit tests covering false-positive bounds and eviction plus integration tests verifying cache bypass falls back to DB – Docs: README section comparing cache strategies and inline notes on tuning false-positive rates – Risks/Deps: guard against cache becoming DoS vector by specifying quotas.
- MidTerm-8 Provide introspection and revocation APIs (
GET /api/v1/token/{token},POST /api/v1/token/{token}/revoke) backed by the token store with auditing fields populated – Tests: HTTP integration tests covering active, revoked, and missing tokens along with auth failure permutations – Docs: document operational playbooks for revocation and add OpenAPI snippets – Risks/Deps: define auth story (mTLS or static keys) before exposing endpoints. - LongTerm-9 Layer in observability and abuse detection (structured tracing, metrics, abuse score escalation, dashboarding hooks) so production deployments can detect DoS or misuse early – Tests: telemetry snapshot tests plus load-test scripts to validate metrics fidelity – Docs: operations guide covering alert thresholds and tracing conventions – Risks/Deps: depends on earlier API + monitor milestones; coordinate with infra for metrics backend.
- ShortTerm-10 Add Unix socket support for the public API listener (env-driven
API_UNIX_SOCKET, auto-clean stale sockets, fallback to TCP bind) – Tests: Actix integration tests exercising socket + TCP modes – Docs: README/.env documenting the behavior and deployment tips – Risks/Deps: file permissions/cleanup errors or SELinux/AppArmor policies blocking socket creation. - ShortTerm-11 Introduce an internal-only API listener (dedicated port or Unix socket) for admin/monitoring routes so Tor-exposed endpoints stay minimal – Tests: ensure internal routes reject external traffic and cover dual-listener wiring – Docs: configuration guidance describing how to bind internal interfaces and enforce permissions – Risks/Deps: added config complexity; need to clearly separate auth for internal vs public endpoints.
- ShortTerm-12 Refactor
anon_ticket_domaininto cohesive modules (config.rs,model/,services/,storage/traits.rs) so binaries import narrowly-scoped APIs and doc comments reflect the new boundaries – Tests: rerun existing unit/property suites plus add module-specific doctests documenting the new paths – Docs: update crate-level//!docs and README “Workspace Layout” to show internal submodules – Risks/Deps: requires careful move semantics to avoid breaking downstream imports; coordinate with open PRs touching domain helpers. - ShortTerm-13 Split the API crate into
handlers/,state.rs, andapplication.rsso HTTP wiring, request handling, and bootstrap logic can evolve independently – Tests: keep current Actix integration tests passing and add handler-level unit tests usingApp::new()scaffolds – Docs: extend README “Redemption API” to mention new module boundaries and note how to embed the server in other binaries – Risks/Deps: heavy file moves may invalidate pending branches; schedule work during low churn windows. - MidTerm-14 Decompose the monitor crate into RPC client, ingestion pipeline, and worker loop modules while introducing a trait-based transfer source for easier simulation – Tests: add mocked transfer-source tests plus regression coverage for height advancement/backoff; keep long-running
tokio::testgated – Docs: produce a short architecture note (and README summary) describing the data flow – Risks/Deps: more traits mean stricter lifetime/Send bounds; ensure reqwest client reuse stays efficient. - MidTerm-15 Break
anon_ticket_storageinto per-trait impl files (payment_store.rs,token_store.rs,monitor_state_store.rs,migration.rs) and add a thin builder for injecting caching/sharding later – Tests: rerun sqlite/postgres integration tests and add targeted unit tests for the builder defaults – Docs: update README “Storage Layer” to explain the builder plus migration split – Risks/Deps: SeaORM entity paths change, so regenerate docs/tests referencing old modules. - ShortTerm-16 Harden domain primitives: remove
AbuseTracker, adoptmokafor caching, and enforce 32-byte PIDs – Context: Audit found memory leaks in abuse tracking (redundant with negative cache) and lock contention in the handwritten PID cache. Actions: 1) DeleteAbuseTrackerto simplify logic and rely on negative caching. 2) ReplaceInMemoryPidCachewithmokafor automatic TTL and lock-free concurrency. 3) BumpPID_LENGTHto 64 hex chars (32 bytes) for maximum entropy. 4) Add.trim()to required env var parsing. – Tests: Property tests for 64-char PIDs, concurrency tests for cache under load. – Risks/Deps: Breaking change for clients expecting 32-char PIDs; requiresmokadependency. - ShortTerm-17 Secure
PaymentIdconstruction and introduce random generation – Context:PaymentId::newandFrom<&str>allow bypassing validation, violating the type-driven security contract. Actions: 1) MakePaymentId::newprivate or restricted (pub(crate)). 2) RemoveFrom<&str>to prevent infallible conversion from untrusted strings. 3) ImplementTryFrom<String>for validated parsing. 4) AddPaymentId::generate()usinggetrandom(gated for Wasm support) to support client-side creation of high-entropy IDs. – Tests: Unit tests confirmingnewis inaccessible publicly (compile-fail) andgenerateproduces valid 64-char hex strings. – Risks/Deps: Breaking change for all downstream crates instantiating PIDs; requires updating all tests to useparseorgenerate. - ShortTerm-18 Polish domain internals: add hash separators and verify Wasm compat – Context:
derive_service_tokenconcatenates inputs without separators (theoretical canonicalization risk if lengths vary in future), andgetrandomneeds explicit feature gating for Wasm targets. Actions: 1) Insert a separator byte (e.g.,|) between PID and TXID inderive_service_token. 2) Ensuregetrandomdependency inCargo.toml(or workspace) enables thejsfeature forwasm32targets to prevent build failures. – Tests: Updatederive_service_tokenunit tests to reflect new hash values; verifycargo build --target wasm32-unknown-unknownpasses (if environment permits) or check feature tree. – Risks/Deps: Changes derived token values (breaking for existing DB records if any); requires Wasm toolchain for verification. - ShortTerm-19 Purge
dotenvydependency in favor of shell-native config – Context: Hardcoding.envfile loading inside the binary is an anti-pattern for production "monolithic fortresses" where env vars are injected by systemd/docker. It adds unnecessary file I/O logic. Actions: 1) Removedotenvyfrom workspace dependencies. 2) Deletehydrate_env_filefromdomain::config. 3) Updateload_from_envmethods to rely strictly onstd::env::var. 4) Documentdirenvorsource .envworkflows for local dev in README. – Tests: Verify binaries still boot when env vars are set externally; verify build size reduction (minor). – Risks/Deps: Breakscargo runfor devs who rely solely on implicit.envloading; requires doc update. - ShortTerm-20 Harden storage configuration and implementation – Context: Audit revealed SQLite is running in default mode (poor concurrency) and
claim_paymentperforms redundant lookups. Actions: 1) InSeaOrmStorage::connect, detect SQLite backend and forcePRAGMA journal_mode=WAL;+PRAGMA synchronous=NORMAL;. 2) Updatemigration.rsto explicitly setstring_len(64)for PID and Token columns. 3) Optimizeclaim_paymentby replacingupdate_many+findwith raw SQLUPDATE ... RETURNING *viaSeaOrm::execute/query_oneto eliminate the second round-trip and lock contention. – Tests: Integration tests verifying WAL mode active andclaim_paymentcorrectness/atomicity. – Risks/Deps: Raw SQL bypasses some SeaORM safeguards; relies on SQLite >= 3.35.0 (standard in modern environments). - ShortTerm-21 Refactor internal types to binary (
[u8; 32]) and storage to BLOBs – Context: UsingString(Hex) to represent PIDs and Tokens wastes 2x memory/storage and CPU cycles. Switching to raw bytes aligns with the "Single-Node Fortress" strategy for maximum density and speed. Actions: 1) ChangePaymentIdandServiceTokeninternals fromStringto[u8; 32]. 2) Updatedomainserialization to handle Hex encoding/decoding at the API boundary (Serde). 3) Updatestoragemigrations to useBLOB/BYTEAinstead ofVARCHAR. 4) Updatestoragemapping logic to read/write bytes directly. – Tests: Verify JSON API still accepts/returns Hex strings; verify DB stores raw bytes (inspect sqlite file size); verify hash derivation remains consistent. – Risks/Deps: Breaking schema change (incompatible with existing String-based DBs); pervasive refactor across all crates. - ShortTerm-22 Optimize
PaymentStatuscolumn toTINYINT– Context: Storing "claimed"/"unclaimed" asVARCHAR(16)wastes space (~7-9 bytes vs 1 byte) and IO bandwidth. "Single-Node Fortress" philosophy prioritizes efficiency over raw DB readability. Actions: 1) Updatemigration.rsto definestatusastiny_integer. 2) Updateentity.rsto mapPaymentStatusDbenum to integers (0=Unclaimed, 1=Claimed). 3) Verifyclaim_paymentraw SQL uses integer literals. – Tests: Verify schema change via migration tests; verify status transitions persist correctly. – Risks/Deps: Breaking schema change; debugging raw DB requires knowing the enum mapping (0/1). - ShortTerm-23 Harden monitor worker against transient failures – Context:
run_monitorcrashes the process ifhandle_batchreturns a storage error. A fortress service should retry on IO failures. Actions: 1) Inworker.rs, catch errors fromhandle_batchinside the loop. 2) Log them as warnings and trigger the sleep/backoff. 3) Only exit on fatal configuration errors. 4) Refactorprocess_entryto remove redundant string validation now thatPaymentId::parsehandles it. – Tests: Add a test case where the storage mock fails once then succeeds; ensure loop continues. – Risks/Deps: None; pure reliability fix. - ShortTerm-24 Filter dust transactions to prevent DB exhaustion – Context: Currently, the monitor persists any incoming transaction with a valid PID, regardless of amount. An attacker could flood the blockchain with "dust" (1 piconero) transactions, filling the SQLite database with garbage records at negligible cost (DoS via resource exhaustion). Actions: 1) Add
MONITOR_MIN_PAYMENT_AMOUNTtoBootstrapConfig(default e.g., 1_000_000 atomic units). 2) Updateprocess_entryinmonitor/pipeline.rsto checkentry.amount < min_amount. 3) If below threshold, log a warning and skip persistence (returnOk(false)). – Tests: Unit testprocess_entrywith amounts below and above the threshold. – Risks/Deps: Legitimate underpayments are discarded (acceptable trade-off for security). - ShortTerm-25 Migrate to encrypted 64-bit Payment IDs to prevent front-running – Context: A critical security review revealed that legacy 32-byte Payment IDs are visible in cleartext on the blockchain. Attackers can scan the mempool, extract these IDs, and race to redeem them before legitimate users (front-running). Actions: 1) Refactor
PaymentIdto wrap[u8; 8](encrypted compact ID) instead of[u8; 32]. 2) Updatestorageschema to useBLOB(8). 3) Ensuremonitordecodes Integrated Addresses via RPC/monero-rs. 4) Publish a security analysis proving that 64-bit entropy is sufficient against brute-force attacks even without IP rate limiting. 5) Maintain the "client-generates-ID" workflow but require clients to construct Integrated Addresses locally. – Tests: Verify collision resistance logic and correct integrated address decoding. – Risks/Deps: Breaking change; requires clients to support Monero address encoding. - ShortTerm-26 Adopt the
monerocrate for canonical address/PID handling and integrated address assembly with high-entropy Payment IDs – Context: The monitor currently hand-rolls Monero RPC types; we need battle-tested primitives and centralized PID generation to avoid collisions. Actions: 1) Introducemonero(monero-rs) across domain/monitor for parsing/validating primary + integrated addresses and transaction IDs; delete bespoke RPC structs. 2) Add a domain-levelIntegratedAddressBuilderthat accepts a validated primary address and a high-entropyPaymentId, returning the integrated address string for both client and monitor use. 3) EnsurePaymentId::generateusesrand_core::OsRng/getrandomwith at least 64 bits of entropy and document the collision budget. – Tests: Round-trip encode/decode integrated addresses; property tests showing sampled PIDs yield unique outputs; regression tests for RPC decoding paths usingmonerotypes. – Risks/Deps: Pulls inmonero/curve25519-dalek(larger binaries); must confirm license compatibility and disable anystd-only features that would block WASM. - ShortTerm-27 Make integrated-address generation WASM-safe with explicit
getrandomgating – Context: Clients will compile the builder towasm32-unknown-unknown; missinggetrandomfeatures orstd-bound code will break builds. Actions: 1) Added a domain-onlywasmfeature enablinggetrandom/jsfor wasm32. 2) Documented the build checkcargo build -p anon_ticket_domain --target wasm32-unknown-unknown --features wasm. 3) Documented consumer guidance for wasm-bindgen/wasm-pack via string-based integrated-address helpers. – Tests: Build-only guidance recorded (no runtime changes). – Risks/Deps: Browser randomness still depends oncrypto.getRandomValues; consumers must pass--features wasmwhen targeting wasm32. - ShortTerm-28 Enhance monitor configurability and observability – Context: The monitor's poll interval is hardcoded (5s), and its metrics are undocumented. We need to let operators tune the latency/load trade-off and provide clear guidance on tracking sync progress via Prometheus (rejecting ad-hoc status APIs to maintain "fortress" simplicity). Actions: 1) Add
MONITOR_POLL_INTERVAL_SECStoBootstrapConfig(default 5). 2) Updaterun_monitorinworker.rsto use this dynamic interval. 3) Updatecrates/monitor/README.mdwith a "Metrics & Observability" section detailingmonitor_last_height,monitor_rpc_calls_total, etc. – Tests: Unit test config loading; verify loop respects interval (mock clock). – Risks/Deps: None. - ShortTerm-29 Secure API revocation endpoint – Context: The
POST /api/v1/token/{token}/revokeendpoint was exposed on the public listener, letting any caller revoke tokens or bump abuse scores. Actions: 1) Move the revoke route to the internal listener inapplication.rs. 2) Add tests proving public listeners return 404 while internal succeeds. 3) Document the internal-only route and mark TODO done. – Tests: Integration test for public 404/internal 200. – Risks/Deps: Admins must configure the internal listener to perform revocations. - ShortTerm-29 Secure API revocation endpoint – Context: The
POST /api/v1/token/{token}/revokeendpoint is currently exposed on the public listener without authentication, allowing any token holder (or brute-forcer) to revoke tokens or manipulate abuse scores. This is an administrative action and must be restricted. Actions: 1) Inapplication.rs, move the revoke route frompublic_servertointernal_server. 2) Ensure the internal server logic correctly handles this new route. – Tests: Integration test verifying 404 on public port and 200 on internal port for revocation. – Risks/Deps: Admins must configure the internal listener to perform revocations. - ShortTerm-30 Monitor Confirmation Safety – Context: Currently, the monitor processes transactions as soon as they appear (even with 0 confirmations). If the blockchain forks (reorg), these transactions might become invalid, but we might have already issued tokens. Actions: 1) Add
MONITOR_MIN_CONFIRMATIONStoBootstrapConfig(default 10). 2) Refactorrun_monitorloop to calculatesafe_height = wallet_height - min_confirmations. 3) Only fetch/process transfers whereheight <= safe_height. 4) Only advancelast_processed_heightup tosafe_height. – Tests: Simulate immature transactions being ignored until chain height advances. – Risks/Deps: Increases user wait time (latency). - ShortTerm-31 Make PID negative-cache grace configurable – Context:
PID_CACHE_NEGATIVE_GRACEis hardcoded to 500ms in the API handler. Actions: addAPI_PID_CACHE_NEGATIVE_GRACE_MS(bounded >0) and surface it inApiConfig; ensure handlers read the configured value and validategrace <= ttl. Tests: Actix handler tests covering short-circuit window; config parsing happy/edge paths. Risks/Deps: too-small grace may raise DB load; too-large grace can block fresh payments briefly. - ShortTerm-32 Expose PID cache TTL/capacity knobs – Context:
InMemoryPidCachecurrently uses 60s TTL and capacity 100k. Actions: addAPI_PID_CACHE_TTL_SECSandAPI_PID_CACHE_CAPACITYenv vars; apply to both positive/negative caches; enforcettl >= grace; document sizing guidance. Tests: config parsing + cache behavior with custom TTL/capacity. Risks/Deps: low TTL collapses hit rate; excessive capacity increases memory footprint. - MidTerm-33 Introduce Bloom filter layer for PID screening – Context: need scalable negative/positive hints with no false negatives (false positives acceptable). Actions: add Bloom filter implementation or crate-backed adapter, fed from cache/storage; wire into redeem path before DB lookups; expose tuning knobs (e.g., false-positive rate, refresh cadence). Tests: property tests proving zero false negatives and bounded FP rate; integration tests showing DB load reduction under spray traffic. Risks/Deps: requires periodic rebuild to bound FP; configuration must prevent mis-sizing on small deployments.
- ShortTerm-34 Co-locate monitor ingestion to seed Bloom/cache in-process – Embedded the monitor inside the API bootstrap (with a dev escape
API_ALLOW_NO_MONITOR), prewarming Bloom/cache from storage and sharing hooks so new payments immediately update the filter; standalone monitor binaries now requireALLOW_STANDALONE_MONITOR=1to avoid production split deployments. - ShortTerm-35 Enforce mandatory Bloom guard at startup – Bloom is now required unless
API_ALLOW_NO_BLOOM=1; startup logs chosen entries/FPR and estimated bitset bytes, and Bloom config errors fail fast to keep the DoS shield intact. - ShortTerm-36 Redeem path Bloom-only screening (no negative cache writes) – Negative cache removed; Bloom negatives 404 immediately, and only confirmed storage hits mark Bloom/cache. Absent lookups never write Bloom, and Bloom-positive/DB-miss cases are counted for FP monitoring.
- ShortTerm-37 Surface Bloom+moka sizing controls and telemetry – Added Bloom sizing log (estimated bytes), cache remains positive-only with existing TTL/capacity knobs, and new metric
api_redeem_bloom_db_miss_totaltracks Bloom FP drift alongside hint counters. - ShortTerm-38 Document Bloom-only DoS posture and operational playbook – README/API/Design docs now describe the Bloom-only defense, removal of negative cache, dev escape hatches, and the expected behavior under Tor/no-IP-limit deployments.
- ShortTerm-39 Enforce mandatory internal listener and drop public metrics fallback – Context: simplify startup paths and guarantee admin/metrics exposure only via an internal endpoint. Actions: 1) Make
API_INTERNAL_BIND_ADDRESSorAPI_INTERNAL_UNIX_SOCKETmandatory (fail fast if both missing). 2) Removeinclude_metrics_on_publicand related conditional wiring; public listener serves only user-facing routes. 3) Update bootstrap errors/docs to reflect the requirement. Tests: config parsing (missing/one/both) and integration ensuring boot fails without an internal listener and metrics/internal routes are unreachable on the public port. Risks/Deps: deployments must provision loopback or Unix socket; Tor-only setups need explicit internal binding. - ShortTerm-40 Simplify platform-specific listener wiring with
cfg_if– Context: scattered#[cfg(unix)]/#[cfg(not(unix))]blocks reduce readability. Actions: consolidate listener/socket setup behindcfg_if!, pruning dead branches exposed by the mandatory-internal change. Tests: build on Unix targets plus a non-Unix compile check; reuse existing API integration tests. Risks/Deps: minimal (small macro dependency); ensure behavior parity on non-Unix platforms. - ShortTerm-41 Rebaseline Bloom sizing and monitoring for 64-bit PIDs – Context: FPR depends on expected unique PID count, not entropy. Actions: 1) Add sizing guidance (entries, FPR, k) for typical volumes (1e6–1e8) and memory footprints; 2) Define alerting guidance on
api_redeem_bloom_db_miss_totalwithout relying on Bloom rebuilds; 3) Provide default preset suggestions for small vs. large deployments. Tests/Docs: update API README/DESIGN and ops notes; code changes only if defaults are adjusted. Risks: mis-sizing could degrade to DB-only path. - ShortTerm-42 Raise dust floor default in samples to bound DoS cost – Context: attackers could pay minimal amounts to bloat Bloom/cache/storage. Actions: pick a higher
MONITOR_MIN_PAYMENT_AMOUNTdefault in.env.example(document rationale vs. memory/bandwidth), and add guidance on tuning it to match Bloom capacity planning. Tests/Docs: update README/monitor docs and config tables; ensure examples stay consistent. Risks: legitimate micro-payments might be rejected; needs clear operator guidance. - ShortTerm-43 Make
TokenStatusResponse.statusan enum and bump API contract – Context: developer-facing introspection currently returns strings ("active"/"revoked"). Actions: switch to an enum in the response schema, update API docs/tests/clients, and record the breaking change in CHANGELOG. Risks: incompatible with existing consumers; may require version negotiation if any external clients exist.
Foundational scaffolding, storage correctness, and API/monitor surfaces are in place; recent work hardened the domain layer. Next, we’ll keep pushing high-concurrency SQLite optimizations (WAL mode, atomic RETURNING queries, binary core types).