A financial-grade digital wallet system with double-entry ledger accounting, designed for Asia-scale fintech usage.
- Problem Statement
- User Personas
- Core Features
- Ledger-Based Accounting Model
- Non-Functional Requirements
- High-Level Architecture
- Tech Stack
- Design Deep Dive
- Getting Started
Digital wallets face critical challenges that can result in financial loss and regulatory violations:
- Double-Spending: Concurrent transactions can overdraw accounts when using naive
balance -= amountupdates - Lost Updates: Race conditions cause transactions to overwrite each other
- Inconsistent State: Network failures mid-transaction leave wallets in undefined states
- Audit Trail Gaps: Direct balance mutations make forensic accounting impossible
- Scalability vs Consistency: Traditional locking doesn't scale to millions of concurrent users
User Balance: $100
T1 (Thread 1): Read balance → $100
T2 (Thread 2): Read balance → $100
T1: Debit $80 → Write $20
T2: Debit $80 → Write $20 ← DOUBLE SPEND! User spent $160 with only $100
Actual balance should be: -$60 (overdraft) or T2 should fail
LedgerX implements double-entry bookkeeping with append-only ledger entries, ensuring:
- Every transaction is atomic and traceable
- Balances are computed from immutable journal entries
- No direct balance mutations ever occur
- Full audit trail for regulatory compliance
- Profile: Mobile-first users in Southeast Asia (Indonesia, Philippines, Vietnam)
- Needs: Instant P2P transfers, bill payments, QR payments
- Pain Points: Transaction failures, unclear balance, delayed settlements
- Scale: 10M+ active users, 1000+ TPS peak
- Profile: Small-to-medium businesses accepting digital payments
- Needs: Real-time settlement, transaction reports, refund capabilities
- Pain Points: Reconciliation errors, chargebacks, cash flow visibility
- Profile: Finance and compliance teams
- Needs: Real-time dashboards, audit logs, regulatory reports
- Pain Points: Manual reconciliation, fraud detection delays
- Profile: Banks, payment gateways, e-commerce platforms
- Needs: Reliable APIs, webhook notifications, idempotent operations
- Pain Points: Integration complexity, retry handling, data consistency
| Feature | Description |
|---|---|
| Create Wallet | Open new wallet with KYC-linked user ID |
| Get Balance | Real-time computed balance from ledger |
| Freeze/Unfreeze | Compliance-triggered account controls |
| Close Wallet | Soft-delete with balance transfer requirement |
| Operation | Description | Atomicity |
|---|---|---|
| Credit | Add funds (top-up, refund, cashback) | Single-entry debit from source |
| Debit | Remove funds (payment, withdrawal) | Single-entry credit to destination |
| Transfer | P2P or merchant payment | Double-entry atomic operation |
| Hold | Reserve funds for pending transaction | Creates hold entry, reduces available |
| Release | Cancel hold, restore available balance | Reverses hold entry |
| Capture | Convert hold to actual debit | Settles held amount |
| Feature | Description |
|---|---|
| Journal Entry | Immutable record of every balance change |
| Balance Computation | SUM(credits) - SUM(debits) per wallet |
| Statement Generation | Date-range transaction history |
| Reconciliation | Automated balance verification |
| Feature | Description |
|---|---|
| Idempotency | Duplicate request detection via client tokens |
| Retry Safety | Safe retries without duplicate processing |
| Webhooks | Real-time event notifications |
| Rate Limiting | Abuse prevention per user/IP |
# ❌ WRONG: Direct balance update (race condition prone)
UPDATE wallets SET balance = balance - 100 WHERE id = 'user123';
# ❌ WRONG: Read-modify-write (lost update prone)
balance = SELECT balance FROM wallets WHERE id = 'user123';
UPDATE wallets SET balance = {balance - 100} WHERE id = 'user123';Every transaction creates exactly two entries that sum to zero:
Transfer $100 from Alice to Bob:
┌─────────────────────────────────────────────────────────────┐
│ Journal Entry #TXN-001 │
├─────────────┬─────────┬────────┬────────┬──────────────────┤
│ Account │ Type │ Debit │ Credit │ Running Balance │
├─────────────┼─────────┼────────┼────────┼──────────────────┤
│ Alice │ ASSET │ $100 │ │ $900 │
│ Bob │ ASSET │ │ $100 │ $200 │
├─────────────┼─────────┼────────┼────────┼──────────────────┤
│ TOTAL │ │ $100 │ $100 │ ✓ Balanced │
└─────────────┴─────────┴────────┴────────┴──────────────────┘
ASSETS (Debit increases, Credit decreases)
├── USER_WALLET # Individual user wallets
├── MERCHANT_WALLET # Business wallets
├── FLOAT_ACCOUNT # Platform operating funds
└── SETTLEMENT_ACCOUNT # Pending settlements
LIABILITIES (Credit increases, Debit decreases)
├── USER_PAYABLE # Owed to users
├── MERCHANT_PAYABLE # Owed to merchants
└── HOLD_ACCOUNT # Funds on hold
REVENUE (Credit increases)
├── TRANSACTION_FEE # Per-transaction fees
├── INTERCHANGE_FEE # Network fees earned
└── INTEREST_INCOME # Float interest
EXPENSES (Debit increases)
├── PAYMENT_GATEWAY_FEE # Third-party costs
├── FRAUD_LOSS # Chargebacks/fraud
└── OPERATIONAL_COST # Platform operations
-- Wallet balance is ALWAYS computed, never stored directly
SELECT
wallet_id,
SUM(CASE WHEN entry_type = 'CREDIT' THEN amount ELSE 0 END) -
SUM(CASE WHEN entry_type = 'DEBIT' THEN amount ELSE 0 END) AS balance
FROM ledger_entries
WHERE wallet_id = :wallet_id
AND status = 'POSTED'
GROUP BY wallet_id;For performance, we maintain a cached balance that's updated atomically:
-- Materialized balance (updated via triggers/events)
CREATE TABLE wallet_balances (
wallet_id UUID PRIMARY KEY,
posted_balance DECIMAL(20,4), -- Settled balance
pending_balance DECIMAL(20,4), -- Including holds
available_balance DECIMAL(20,4), -- posted - holds
last_entry_id BIGINT, -- For consistency check
updated_at TIMESTAMP
);
-- Consistency invariant (verified periodically)
ASSERT wallet_balances.posted_balance == SUM(ledger_entries for wallet)| Requirement | Target | Mechanism |
|---|---|---|
| Transaction Atomicity | 100% | Database transactions + saga pattern |
| Balance Accuracy | 100% | Ledger-based computation |
| Double-spend Prevention | 100% | Optimistic locking + idempotency |
| Eventual Consistency Window | < 100ms | Async event propagation |
| Requirement | Target | Mechanism |
|---|---|---|
| Data Persistence | 99.999999999% | Multi-AZ PostgreSQL + WAL archiving |
| Point-in-time Recovery | 30 days | Continuous backup + PITR |
| Audit Log Retention | 7 years | Immutable append-only logs |
| Disaster Recovery | RPO < 1min, RTO < 15min | Cross-region replication |
| Requirement | Target | Mechanism |
|---|---|---|
| API Uptime | 99.95% | Multi-AZ deployment + health checks |
| Read Availability | 99.99% | Read replicas + caching |
| Graceful Degradation | Yes | Circuit breakers + fallbacks |
| Metric | Target | Strategy |
|---|---|---|
| Concurrent Users | 10M+ | Horizontal API scaling |
| Transactions/Second | 10,000 TPS | Sharded writes + CQRS |
| Storage Growth | 1TB/month | Time-series partitioning |
| Wallet Count | 100M+ | Hash-based sharding |
| Requirement | Implementation |
|---|---|
| Encryption at Rest | AES-256 for all PII and financial data |
| Encryption in Transit | TLS 1.3 mandatory |
| Authentication | OAuth 2.0 + JWT with short expiry |
| Authorization | RBAC + resource-level permissions |
| Audit Logging | Immutable logs for all operations |
| PCI-DSS Compliance | Tokenized card data, no PAN storage |
| Operation | P50 | P99 | Max |
|---|---|---|---|
| Balance Query | 5ms | 50ms | 200ms |
| Transfer | 50ms | 200ms | 500ms |
| Statement (30 days) | 100ms | 500ms | 2s |
┌─────────────────────────────────────────────────────────────────────────────┐
│ CLIENTS │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Mobile App │ │ Web App │ │ Partner API │ │ Admin UI │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
└─────────┼────────────────┼────────────────┼────────────────┼────────────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ API GATEWAY (Kong/AWS API Gateway) │
│ • Rate Limiting • Authentication • Request Routing • SSL Termination │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ APPLICATION LAYER (Kubernetes) │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Wallet Service │ │ Ledger Service │ │ Transfer Service│ │
│ │ (FastAPI) │ │ (FastAPI) │ │ (FastAPI) │ │
│ │ │ │ │ │ │ │
│ │ • Create Wallet │ │ • Post Entry │ │ • P2P Transfer │ │
│ │ • Get Balance │ │ • Get Statement │ │ • Hold/Capture │ │
│ │ • Freeze/Close │ │ • Reconcile │ │ • Refund │ │
│ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │
│ │ │ │ │
│ ┌────────┴────────────────────┴────────────────────┴────────┐ │
│ │ DOMAIN LAYER │ │
│ │ • Transaction Coordinator • Balance Calculator │ │
│ │ • Idempotency Manager • Event Publisher │ │
│ └────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ DATA LAYER │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ PostgreSQL │ │ Redis │ │ Kafka/SQS │ │
│ │ (Primary) │ │ (Cache) │ │ (Events) │ │
│ │ │ │ │ │ │ │
│ │ • Ledger Entries│ │ • Balance Cache │ │ • Txn Events │ │
│ │ • Wallets │ │ • Idempotency │ │ • Webhooks │ │
│ │ • Transactions │ │ • Rate Limits │ │ • Audit Stream │ │
│ │ • Audit Logs │ │ • Sessions │ │ • Analytics │ │
│ └────────┬────────┘ └─────────────────┘ └────────┬────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Read Replicas │ │ Event Workers │ │
│ │ (2-3 replicas) │ │ • Webhook │ │
│ └─────────────────┘ │ • Notification │ │
│ │ • Analytics │ │
│ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ OBSERVABILITY & OPERATIONS │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Prometheus + │ │ ELK Stack / │ │ Jaeger / │ │
│ │ Grafana │ │ CloudWatch │ │ X-Ray │ │
│ │ (Metrics) │ │ (Logs) │ │ (Tracing) │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
- Rate Limiting: 1000 req/min per user, 10000 req/min per partner
- Authentication: JWT validation, API key verification
- Request Routing: Version-based routing, A/B testing
- DDoS Protection: AWS Shield / Cloudflare integration
- Wallet lifecycle management (create, freeze, close)
- Balance queries with caching
- KYC status integration
- Core responsibility: Append-only journal entries
- Balance computation from ledger
- Statement generation
- Reconciliation jobs
- Transaction orchestration (saga pattern)
- Idempotency enforcement
- Hold/capture/release flows
- Rollback handling
- Async webhook delivery with retry
- Push notifications
- Analytics event processing
- Audit log archival
| Component | Technology | Justification |
|---|---|---|
| API Framework | FastAPI (Python) | Async support, auto OpenAPI docs, type hints, excellent for fintech |
| Primary Database | PostgreSQL 15+ | ACID compliance, SERIALIZABLE isolation, excellent for financial data |
| Cache | Redis Cluster | Sub-ms latency, Lua scripting for atomic ops, pub/sub for events |
| Message Queue | Apache Kafka | Ordered events, replay capability, high throughput |
| API Gateway | Kong / AWS API Gateway | Rate limiting, auth, observability built-in |
| Component | Technology | Justification |
|---|---|---|
| Container Orchestration | Kubernetes (EKS/GKE) | Auto-scaling, rolling deployments, self-healing |
| Service Mesh | Istio | mTLS, traffic management, observability |
| Secrets Management | HashiCorp Vault | Dynamic secrets, encryption-as-a-service |
| CI/CD | GitHub Actions + ArgoCD | GitOps, automated deployments |
| Component | Technology | Justification |
|---|---|---|
| Metrics | Prometheus + Grafana | Industry standard, rich ecosystem |
| Logging | ELK Stack / Loki | Centralized logs, full-text search |
| Tracing | Jaeger / AWS X-Ray | Distributed tracing for debugging |
| Alerting | PagerDuty + Grafana | On-call rotation, alert aggregation |
-
PostgreSQL over NoSQL: Financial systems require ACID guarantees. PostgreSQL's SERIALIZABLE isolation prevents double-spending at the database level.
-
FastAPI over Django/Flask: Native async support handles thousands of concurrent connections efficiently. Type hints catch errors early.
-
Kafka over RabbitMQ: Event replay capability is crucial for reconciliation. Kafka's log-based architecture matches our append-only ledger philosophy.
-
Redis for Caching: Atomic operations (WATCH/MULTI/EXEC) enable safe balance caching with optimistic locking.
-
Kubernetes: Asia-scale requires elastic scaling. K8s handles traffic spikes during promotions/festivals seamlessly.
- Python 3.11+
- Docker & Docker Compose
- PostgreSQL 15+
- Redis 7+
# Clone the repository
git clone https://github.com/yourorg/ledgerx.git
cd ledgerx
# Start dependencies
docker-compose up -d postgres redis kafka
# Install dependencies
pip install -r requirements.txt
# Run migrations
alembic upgrade head
# Start the API server
uvicorn app.main:app --reload --port 8000
# Access API docs
open http://localhost:8000/docsledgerx/
├── app/
│ ├── api/ # API routes
│ │ ├── v1/
│ │ │ ├── wallets.py
│ │ │ ├── transfers.py
│ │ │ └── ledger.py
│ │ └── deps.py # Dependencies
│ ├── core/ # Core configuration
│ │ ├── config.py
│ │ ├── security.py
│ │ └── exceptions.py
│ ├── domain/ # Business logic
│ │ ├── models/
│ │ ├── services/
│ │ └── events/
│ ├── infrastructure/ # External integrations
│ │ ├── database/
│ │ ├── cache/
│ │ └── messaging/
│ └── main.py
├── migrations/ # Alembic migrations
├── tests/
├── docker-compose.yml
├── Dockerfile
└── requirements.txt
This section provides detailed clarifications on critical design decisions for production deployment.
| Snapshot Type | Frequency | Retention | Use Case |
|---|---|---|---|
| Hot Snapshots | Every 1 hour | 7 days | Fast balance recovery, recent queries |
| Daily Snapshots | End of day (UTC) | 2 years | Statements, reconciliation, audits |
| Monthly Snapshots | End of month | 7 years | Regulatory compliance, long-term audits |
┌─────────────────────────────────────────────────────────────────────────────┐
│ BALANCE SNAPSHOT WORKFLOW │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. TRIGGER 2. COMPUTE 3. PERSIST │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Scheduled │────────────▶│ SELECT SUM │────────────▶│ INSERT INTO │ │
│ │ Cron Job │ │ FROM ledger │ │ snapshots │ │
│ │ (K8s CronJob)│ │ WHERE posted │ │ │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Acquire │ │ Compare with │ │ Update │ │
│ │ Advisory Lock│ │ Cached Balance│ │ last_entry_id│ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Discrepancy? │ │
│ │ Alert + Log │ │
│ └──────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
| Failure Mode | Detection | Recovery | Impact |
|---|---|---|---|
| Snapshot job fails | K8s job monitoring + PagerDuty alert | Auto-retry with exponential backoff (3 attempts) | None - balance still computed from ledger |
| Snapshot divergence detected | Reconciliation comparison | Trigger full recompute from ledger + alert ops | Snapshot marked invalid, queries fall back to ledger |
| Database unavailable | Health check failure | Wait for recovery, resume from last last_entry_id |
Delayed snapshots, no data loss |
| Partial snapshot (mid-write crash) | completed_at IS NULL check |
Delete incomplete, re-run | None - atomic transaction ensures consistency |
-- Snapshots are created atomically within a transaction
BEGIN;
-- Lock to prevent concurrent snapshots for same wallet
SELECT pg_advisory_xact_lock(hashtext('snapshot:' || wallet_id::text));
-- Compute balance at specific entry point
INSERT INTO balance_snapshots (wallet_id, snapshot_date, posted_balance, held_balance, last_entry_id, entry_count)
SELECT
wallet_id,
CURRENT_DATE,
SUM(CASE WHEN entry_type = 'CREDIT' THEN amount ELSE -amount END),
COALESCE((SELECT SUM(amount) FROM holds WHERE wallet_id = le.wallet_id AND status = 'ACTIVE'), 0),
MAX(id),
COUNT(*)
FROM ledger_entries le
WHERE wallet_id = :wallet_id AND status = 'POSTED'
GROUP BY wallet_id;
COMMIT;Invariant: balance_snapshots.posted_balance + SUM(ledger_entries WHERE id > last_entry_id) == wallet_balances.posted_balance
LedgerX follows the compensating transaction pattern - reversals never mutate historical data. Instead, new entries are created that offset the original transaction.
| Reversal Type | Trigger | Timing | Creates |
|---|---|---|---|
| Full Refund | Customer request, dispute | Any time | Inverse entries for full amount |
| Partial Refund | Merchant-initiated | Any time | Inverse entries for partial amount |
| Void | System/admin action | Within 24 hours | Void entries, marks original as VOIDED |
| Chargeback | Bank/card network | Up to 120 days | Compensating entries + chargeback record |
| Correction | Admin with approval | Any time | Adjustment entries with audit trail |
Original Transfer: Alice → Bob ($100)
┌─────────────────────────────────────────────────────────────────────────────┐
│ Transaction #TXN-001 (status: COMPLETED) │
├─────────────┬─────────┬────────┬────────┬──────────────────────────────────┤
│ Entry #1 │ Alice │ DEBIT │ $100 │ Posted 2024-01-15 10:00:00 │
│ Entry #2 │ Bob │ CREDIT │ $100 │ Posted 2024-01-15 10:00:00 │
└─────────────┴─────────┴────────┴────────┴──────────────────────────────────┘
Reversal: Full Refund
┌─────────────────────────────────────────────────────────────────────────────┐
│ Transaction #TXN-002 (status: COMPLETED, parent: TXN-001, type: REFUND) │
├─────────────┬─────────┬────────┬────────┬──────────────────────────────────┤
│ Entry #3 │ Bob │ DEBIT │ $100 │ Posted 2024-01-16 14:30:00 │
│ Entry #4 │ Alice │ CREDIT │ $100 │ Posted 2024-01-16 14:30:00 │
└─────────────┴─────────┴────────┴────────┴──────────────────────────────────┘
Net Effect: Alice $0, Bob $0 (back to original state)
Original entries remain IMMUTABLE - audit trail preserved
┌─────────────────────────────────────────────────────────────────────────────┐
│ REVERSAL WORKFLOW │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Request │───▶│ Validate │───▶│ Create │───▶│ Post │ │
│ │ Reversal│ │ Original Txn│ │ Compensating│ │ Entries │ │
│ └─────────┘ └─────────────┘ │ Transaction │ └─────────────┘ │
│ │ └─────────────┘ │ │
│ ▼ │ ▼ │
│ ┌─────────────┐ │ ┌─────────────┐ │
│ │ Reversible? │ │ │ Update │ │
│ │ • Not already│ │ │ Original │ │
│ │ reversed │ │ │ Status │ │
│ │ • Within │ │ │ → REVERSED │ │
│ │ timeframe │ │ └─────────────┘ │
│ │ • Balance OK │ │ │ │
│ └─────────────┘ │ ▼ │
│ │ │ ┌─────────────┐ │
│ ▼ │ │ Emit Event │ │
│ ┌─────────────┐ │ │ txn.reversed│ │
│ │ Link via │◀──────────┘ └─────────────┘ │
│ │ parent_txn_id │
│ └─────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
-- Reversals reference the original transaction
ALTER TABLE transactions ADD CONSTRAINT fk_parent_transaction
FOREIGN KEY (parent_transaction_id) REFERENCES transactions(id);
-- Prevent double reversal
CREATE UNIQUE INDEX idx_unique_reversal ON transactions(parent_transaction_id)
WHERE transaction_type IN ('REFUND', 'VOID', 'CHARGEBACK');
-- Reversal must match original amount (for full reversals)
-- Enforced at application layer with partial refund tracking- Immutability: Original ledger entries are NEVER modified
- Traceability:
parent_transaction_idlinks reversal to original - Auditability: Complete history preserved for compliance
- Idempotency: Same reversal request returns same result
LedgerX uses the Transactional Outbox Pattern to ensure reliable event delivery without distributed transactions.
| Guarantee | Implementation | Tradeoff |
|---|---|---|
| At-Least-Once | ✅ Default | Consumers must be idempotent |
| Exactly-Once | ❌ Not guaranteed | Too costly for performance |
| Ordering | ✅ Per-wallet | Global ordering not guaranteed |
┌─────────────────────────────────────────────────────────────────────────────┐
│ TRANSACTIONAL OUTBOX PATTERN │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ WRITE PATH (Atomic) PUBLISH PATH (Async) │
│ ───────────────────── ──────────────────── │
│ │
│ ┌─────────────┐ ┌─────────────────┐ │
│ │ Application │ │ Outbox Poller │ │
│ │ Service │ │ (Debezium/Poll) │ │
│ └──────┬──────┘ └────────┬────────┘ │
│ │ │ │
│ │ BEGIN TRANSACTION │ Poll every 100ms │
│ ▼ ▼ │
│ ┌─────────────────────────────────┐ ┌─────────────────┐ │
│ │ PostgreSQL │ │ SELECT * FROM │ │
│ │ ┌───────────────────────────┐ │ │ outbox_events │ │
│ │ │ 1. INSERT ledger_entries │ │ │ WHERE published │ │
│ │ │ 2. INSERT transactions │ │───▶│ = FALSE │ │
│ │ │ 3. INSERT outbox_events │ │ │ ORDER BY seq │ │
│ │ └───────────────────────────┘ │ └────────┬────────┘ │
│ │ COMMIT │ │ │
│ └─────────────────────────────────┘ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Publish to │ │
│ │ Kafka/SQS │ │
│ └────────┬────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ UPDATE outbox │ │
│ │ SET published │ │
│ │ = TRUE │ │
│ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
CREATE TABLE outbox_events (
id BIGSERIAL PRIMARY KEY,
-- Event identity
event_id UUID NOT NULL DEFAULT uuid_generate_v4(),
event_type VARCHAR(64) NOT NULL, -- e.g., 'transaction.completed'
-- Routing
aggregate_type VARCHAR(64) NOT NULL, -- e.g., 'wallet', 'transaction'
aggregate_id UUID NOT NULL, -- e.g., wallet_id
-- Payload
payload JSONB NOT NULL,
-- Publishing state
published BOOLEAN NOT NULL DEFAULT FALSE,
published_at TIMESTAMP WITH TIME ZONE,
publish_attempts INTEGER NOT NULL DEFAULT 0,
last_error TEXT,
-- Ordering
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
-- Cleanup
expires_at TIMESTAMP WITH TIME ZONE DEFAULT (NOW() + INTERVAL '7 days')
);
CREATE INDEX idx_outbox_unpublished ON outbox_events(created_at)
WHERE published = FALSE;
CREATE INDEX idx_outbox_aggregate ON outbox_events(aggregate_type, aggregate_id);| Event | Trigger | Payload | Consumers |
|---|---|---|---|
wallet.created |
Wallet creation | Wallet details | Analytics, Notifications |
wallet.frozen |
Freeze action | Wallet ID, reason | Compliance, Notifications |
transaction.pending |
Txn initiated | Transaction details | Webhooks, Monitoring |
transaction.completed |
Txn success | Full transaction + entries | Webhooks, Analytics, Reconciliation |
transaction.failed |
Txn failure | Transaction + error | Webhooks, Alerts |
transaction.reversed |
Reversal posted | Original + reversal txn | Webhooks, Reconciliation |
balance.updated |
Balance change | Wallet ID, new balance | Real-time dashboards |
hold.created |
Hold placed | Hold details | Webhooks |
hold.released |
Hold released/captured | Hold resolution | Webhooks |
Since we guarantee at-least-once delivery, consumers MUST be idempotent:
# Consumer idempotency pattern
async def handle_transaction_completed(event: Event):
# Check if already processed
if await event_store.is_processed(event.event_id):
logger.info(f"Event {event.event_id} already processed, skipping")
return
try:
# Process event
await process_webhook(event)
# Mark as processed
await event_store.mark_processed(event.event_id)
except Exception as e:
# Will be retried (at-least-once)
raise| Failure | Detection | Recovery | SLA |
|---|---|---|---|
| Poller crash | K8s liveness probe | Auto-restart, resume from last position | < 30s |
| Kafka unavailable | Publish timeout | Exponential backoff, DLQ after 5 attempts | Events delayed, not lost |
| Consumer crash | Consumer group rebalance | Another consumer picks up | < 10s |
| Poison message | Repeated failures | Move to Dead Letter Queue + alert | Manual intervention |
┌─────────────────────────────────────────────────────────────────────────────┐
│ TRUST BOUNDARY ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ZONE 0: PUBLIC (Untrusted) │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ Mobile Apps │ Web Apps │ Third-party Partners │ Public Internet│ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │ HTTPS + mTLS (partners) │
│ ▼ │
│ ─────────────────────── WAF / DDoS Protection ───────────────────────── │
│ │ │
│ ▼ │
│ ZONE 1: DMZ (Semi-trusted) │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ API GATEWAY │ │
│ │ • Rate limiting • JWT validation • Request sanitization │ │
│ │ • API key validation • IP allowlisting • Request/response logging │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │ Internal mTLS │
│ ▼ │
│ ZONE 2: APPLICATION (Trusted) │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Public │ │ Internal │ │ Admin │ │ │
│ │ │ API │ │ API │ │ API │ │ │
│ │ │ (Customer) │ │ (Services) │ │ (Operators) │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ │ │ │ │ │ │
│ │ └────────────────┴────────────────┘ │ │
│ │ │ │ │
│ │ Service Mesh (Istio) │ │
│ │ • mTLS between services │ │
│ │ • Authorization policies │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │ Private subnet only │
│ ▼ │
│ ZONE 3: DATA (Highly Trusted) │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ PostgreSQL │ │ Redis │ │ Kafka │ │ │
│ │ │ (Encrypted) │ │ (Auth'd) │ │ (SASL) │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
| API Type | Audience | Authentication | Authorization | Rate Limit |
|---|---|---|---|---|
| Public API | End users, mobile apps | OAuth 2.0 + JWT | User can only access own resources | 100 req/min |
| Partner API | B2B integrations | API Key + mTLS | Scoped to partner's users | 1000 req/min |
| Internal API | Microservices | Service account + mTLS | Service-to-service policies | 10000 req/min |
| Admin API | Platform operators | SSO + MFA + JWT | RBAC with approval workflows | 100 req/min |
┌─────────────────────────────────────────────────────────────────────────────┐
│ AUTHENTICATION FLOWS │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ PUBLIC API (End Users) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Client │────▶│ Auth0/ │────▶│ JWT │────▶│ API │ │
│ │ App │ │ Cognito │ │ (15min) │ │ Gateway │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │ │ │
│ └─────────── Refresh Token (7 days) ───────────────┘ │
│ │
│ PARTNER API (B2B) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Partner │────▶│ API Key │────▶│ mTLS │────▶│ API │ │
│ │ Server │ │ (Header) │ │ Cert │ │ Gateway │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │
│ INTERNAL API (Service-to-Service) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Service │────▶│ SPIFFE/ │────▶│ Target │ │
│ │ A │ │ mTLS │ │ Service │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │
│ ADMIN API (Operators) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Admin │────▶│ SSO │────▶│ MFA │────▶│ Admin │ │
│ │ User │ │ (Okta) │ │ (TOTP) │ │ API │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
| Resource | End User | Partner | Internal Service | Admin |
|---|---|---|---|---|
| Own wallet balance | ✅ Read | ✅ Read (their users) | ✅ Read/Write | ✅ Full |
| Other wallet balance | ❌ | ❌ | ✅ Read | ✅ Read |
| Create transaction | ✅ Own wallet | ✅ Their users | ✅ Any | ✅ Any |
| Reverse transaction | ❌ | ✅ Their transactions | ✅ Any | ✅ Any |
| Freeze wallet | ❌ | ❌ | ❌ | ✅ With approval |
| View audit logs | ❌ | ❌ | ❌ | ✅ Compliance role |
| System configuration | ❌ | ❌ | ❌ | ✅ Super admin |
# Secrets hierarchy (HashiCorp Vault)
secret/
├── ledgerx/
│ ├── prod/
│ │ ├── database/ # DB credentials (rotated daily)
│ │ ├── redis/ # Redis auth
│ │ ├── kafka/ # SASL credentials
│ │ ├── jwt/ # JWT signing keys (rotated weekly)
│ │ └── encryption/ # Data encryption keys
│ ├── staging/
│ └── dev/
└── partners/
├── partner-a/ # Partner-specific API keys
└── partner-b/| Decision | Chosen Approach | Alternative | Why This Choice |
|---|---|---|---|
| Balance Storage | Cached + computed from ledger | Direct balance column | Audit trail + consistency > raw speed |
| Event Delivery | At-least-once with outbox | Exactly-once (2PC) | Simpler, faster, idempotent consumers |
| Database | PostgreSQL (single logical DB) | Microservice DBs | ACID for financial data, simpler transactions |
| Sharding | Application-level by wallet_id | Database sharding | Flexibility, avoid cross-shard transactions |
| API Style | REST + async events | GraphQL / gRPC | Simpler client integration, better caching |
| Reversal Model | Compensating entries | Soft delete / mutation | Immutable audit trail, compliance |
Why PostgreSQL over distributed databases (CockroachDB, Spanner)?
- Financial transactions require strict SERIALIZABLE isolation
- Simpler operational model for < 100K TPS
- Proven reliability for banking workloads
- Native partitioning handles scale requirements
- Lower cost and complexity
Why Kafka over RabbitMQ/SQS?
- Log-based architecture matches ledger philosophy
- Event replay for reconciliation and debugging
- Higher throughput for event streaming
- Consumer groups for parallel processing
- Exactly-once semantics with transactions
Why FastAPI over Django/Go?
- Async-first for I/O-bound financial operations
- Auto-generated OpenAPI for partner integration
- Type hints catch errors at development time
- Python ecosystem for data analysis/ML fraud detection
- Acceptable performance with uvicorn + gunicorn
Consistency
△
/|\
/ | \
/ | \
/ | \
/ CP | \
/ ● | \
/ LedgerX \
/ | \
/ | \
/________|_________\
Availability ──────── Partition
Tolerance
LedgerX chooses CP (Consistency + Partition Tolerance):
- Financial accuracy is non-negotiable
- Brief unavailability > incorrect balances
- Async operations provide eventual availability
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌──────────┐ ┌─────────┐
│ Client │ │ API │ │ Transfer│ │ Ledger │ │ Kafka │
│ │ │ Gateway │ │ Service │ │ (Postgres)│ │ │
└────┬────┘ └────┬────┘ └────┬────┘ └────┬─────┘ └────┬────┘
│ │ │ │ │
│ POST /transfer│ │ │ │
│──────────────▶│ │ │ │
│ │ Validate JWT │ │ │
│ │──────────────▶│ │ │
│ │ │ │ │
│ │ │ BEGIN TXN │ │
│ │ │──────────────▶│ │
│ │ │ │ │
│ │ │ Check balance │ │
│ │ │──────────────▶│ │
│ │ │◀──────────────│ │
│ │ │ │ │
│ │ │ INSERT entries│ │
│ │ │──────────────▶│ │
│ │ │ │ │
│ │ │ INSERT outbox │ │
│ │ │──────────────▶│ │
│ │ │ │ │
│ │ │ COMMIT │ │
│ │ │──────────────▶│ │
│ │ │◀──────────────│ │
│ │ │ │ │
│ │◀──────────────│ │ │
│◀──────────────│ │ │ │
│ 201 Created │ │ │ │
│ │ │ │ Poll outbox │
│ │ │ │───────────────▶│
│ │ │ │ │
│ │ │ │ Publish │
│ │ │ │───────────────▶│
│ │ │ │ │
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌──────────┐
│Merchant │ │ API │ │ Hold │ │ Database │
│ │ │ │ │ Service │ │ │
└────┬────┘ └────┬────┘ └────┬────┘ └────┬─────┘
│ │ │ │
│ POST /holds │ │ │
│──────────────▶│──────────────▶│ │
│ │ │ Check avail │
│ │ │──────────────▶│
│ │ │◀──────────────│
│ │ │ │
│ │ │ INSERT hold │
│ │ │──────────────▶│
│ │ │ │
│ │ │ UPDATE balance│
│ │ │ (held_balance)│
│ │ │──────────────▶│
│ │◀──────────────│ │
│◀──────────────│ hold_id │ │
│ │ │ │
│ .... │ (time passes - up to 7 days)│
│ │ │ │
│ POST /capture │ │ │
│──────────────▶│──────────────▶│ │
│ │ │ Validate hold │
│ │ │──────────────▶│
│ │ │ │
│ │ │ CREATE debit │
│ │ │ entries │
│ │ │──────────────▶│
│ │ │ │
│ │ │ UPDATE hold │
│ │ │ status=CAPTURED
│ │ │──────────────▶│
│ │◀──────────────│ │
│◀──────────────│ transaction │ │
MIT License - see LICENSE for details.
See CONTRIBUTING.md for contribution guidelines.