The randomness worker processes pending randomness requests from the queue. It determines whether to use VRF (high-stakes) or PRNG (low-stakes), computes the seed and proof, and submits the result to the Soroban contract.
RandomnessRequested Event (Stellar Horizon)
↓
Bull Queue (Redis) with Priority
↓
RandomnessWorker.handleRandomnessJob()
↓
┌───┴────────────────────────┐
│ 1. Check contract status │
│ 2. Get prize amount │
│ 3. Determine VRF/PRNG │
│ 4. Compute randomness │
│ 5. Submit to contract │
│ 6. Track SLA (high-priority)│
└────────────────────────────┘
- Automatic priority assignment based on prize amount
- High-stakes raffles (≥500 XLM): HIGH priority (processed first)
- Standard raffles (<500 XLM): NORMAL priority
- Manual priority override via contract event flag
- SLA monitoring for high-priority jobs (5s threshold)
See PRIORITY_QUEUE_IMPLEMENTATION.md for details.
- Event listener enqueues jobs into Bull with
attempts: 5and exponential backoff. - Redis acts as the persistent store for the queue.
- Queries contract to verify raffle not already finalized.
- Serves as the primary idempotency check for retried jobs and re-emitted events.
- Uses
prizeAmountfrom event payload if available - Falls back to
ContractService.getRaffleData()RPC call if not
- High-stakes (≥ 500 XLM): Uses VRF for cryptographic verifiability
- Low-stakes (< 500 XLM): Uses PRNG for instant, zero-cost randomness
- VrfService: Ed25519 VRF with proof generation
- PrngService: SHA-256 PRNG with timestamp + entropy
- Builds
receive_randomness(raffleId, seed, proof)transaction - Signs with oracle keypair
- Submits to Soroban RPC
- Polls for confirmation
Consumer processor that handles jobs from the randomness-queue.
Key Methods:
@Process() handleRandomnessJob(job: Job<RandomnessJobPayload>): Promise<void>- Processes a single job
Interacts with Soroban contract for read operations.
Methods:
getRaffleData(raffleId): Promise<RaffleData>- Fetches raffle detailsisRandomnessSubmitted(raffleId): Promise<boolean>- Checks if already finalized
Generates verifiable random function output for high-stakes raffles.
Methods:
compute(requestId): Promise<RandomnessResult>- Computes VRF seed + proof
Generates pseudo-random output for low-stakes raffles.
Methods:
compute(requestId): Promise<RandomnessResult>- Computes PRNG seed
Submits randomness to the contract with robust fault tolerance, explicit state machine tracking, and strictly typed outcomes.
Primary Method:
submitRandomnessTyped(raffleId, requestId, randomness): Promise<TransactionOutcome>- Submits with typed outcomes
Legacy Method (Deprecated):
submitRandomness(raffleId, randomness): Promise<SubmitResult>- Backward compatibility wrapper
Features:
- ✅ Explicit transaction lifecycle state machine (BUILDING → SIGNING → SUBMITTING → POLLING → TERMINAL)
- ✅ Strictly typed outcomes (7 distinct outcome types with discriminated union)
- ✅ Duplicate detection and handling (treats as success)
- ✅ Polling strategy with 30-second timeout and 1-second intervals
- ✅ Timeout fallback that polls transaction hash on 504 errors
- ✅ Error classification matrix (retriable vs non-retriable)
- ✅ Structured telemetry logging with all required fields
- ✅ RPC failover to backup endpoints
- ✅ Comprehensive test suite with 95%+ coverage
📖 See Transaction Submitter Guide for complete documentation
📋 See Transaction Submitter Quick Reference for quick reference
- Worker throws errors on failure to trigger queue retry mechanism
- Idempotency ensures safe retries (won't double-submit)
- Contract status check prevents submission to finalized raffles
Run unit tests:
npm testTest Coverage:
- ✅ Low-stakes PRNG path
- ✅ High-stakes VRF path
- ✅ Prize amount fetching from contract
- ✅ Duplicate request handling
- ✅ Already-finalized raffle handling
- ✅ Error handling and retry behavior
| Endpoint | Description |
|---|---|
GET /health |
Liveness check — returns healthy/unhealthy + pending lag count |
GET /oracle/status |
Full status — metrics, lag, RPC health, multi-oracle state, recent errors |
GET /health response:
{
"status": "healthy",
"timestamp": "2026-04-23T12:00:00.000Z",
"pendingLagRequests": 0
}GET /oracle/status response (abbreviated):
{
"status": "healthy",
"metrics": {
"queueDepth": 2,
"lastProcessedAt": "2026-04-23T11:59:00.000Z",
"totalProcessed": 142,
"totalFailed": 1,
"successRate": "99.30%",
"streamStatus": "connected"
},
"lag": {
"pendingCount": 1,
"pendingRequests": [
{ "requestId": "req-abc", "raffleId": 7, "requestedAtLedger": 1234500 }
]
},
"rpc": [{ "url": "https://soroban-testnet.stellar.org", "healthy": true }],
"recentErrors": []
}LagMonitorService tracks every RandomnessRequested event by ledger number. If a request is not fulfilled within 100 ledgers (~8 minutes on Stellar), an [ALERT] log is emitted:
[ALERT] Request req-abc for raffle 7 not fulfilled within 100 ledgers. Lag: 103
- Liveness probe:
GET /health— use as KuberneteslivenessProbe - Alerting: Scrape logs for
[ALERT]pattern or wire a log aggregator - Metrics:
queueDepth > 10warns;queueDepth > 50marks unhealthy - Heartbeat: Oracle pings the contract every
HEARTBEAT_INTERVAL_MS(default: 1 hour)
The oracle uses Bull (backed by Redis) to reliably process randomness requests with an explicit state machine for lifecycle management.
The queue implements a robust state machine with 8 distinct states:
queued → generating → submitting → confirming → confirmed ✓
↓ ↓ ↓ ↓
└─────────┴────────────┴────────────→ retrying → (back to generating or dead-lettered)
States:
queued- Waiting for processing slotgenerating- Computing randomness (VRF/PRNG)submitting- Sending transaction to networkconfirming- Waiting for on-chain confirmationconfirmed- ✅ Success (terminal)retrying- In backoff before next attemptfailed- ❌ Non-retriable error (terminal)dead-lettered-⚠️ Max retries exhausted, requires manual rescue (terminal)
| Setting | Default | Environment Variable |
|---|---|---|
| Queue name | randomness-queue |
- |
| Max retries | 5 | QUEUE_MAX_RETRIES |
| Initial backoff | 2000ms | QUEUE_INITIAL_BACKOFF_MS |
| Backoff multiplier | 2 (exponential) | QUEUE_BACKOFF_MULTIPLIER |
| Max backoff | 60000ms (1 min) | QUEUE_MAX_BACKOFF_MS |
| Confirmation timeout | 30000ms (30s) | QUEUE_CONFIRMATION_TIMEOUT_MS |
| Max concurrency | 10 | QUEUE_MAX_CONCURRENCY |
| Generation timeout | 15000ms (15s) | QUEUE_GENERATION_TIMEOUT_MS |
| Submission timeout | 45000ms (45s) | QUEUE_SUBMISSION_TIMEOUT_MS |
| Endpoint | Description |
|---|---|
GET /queue/metrics |
Comprehensive metrics by state |
GET /queue/health |
Health status (healthy/degraded/unhealthy) |
GET /queue/jobs/:state |
Jobs in specific state |
GET /queue/dead-letter |
Jobs requiring manual rescue |
GET /queue/config |
Current configuration |
Example metrics response:
{
"queuedCount": 5,
"generatingCount": 2,
"submittingCount": 1,
"confirmingCount": 3,
"retryingCount": 1,
"confirmedCount": 150,
"failedCount": 2,
"deadLetteredCount": 0,
"pendingCount": 12,
"totalFailedCount": 2
}Required environment variables:
REDIS_HOST=localhost # Redis server hostname
REDIS_PORT=6379 # Redis server port
Redis must be running before starting the oracle. A minimal local setup:
docker run -d -p 6379:6379 redis:7-alpine📖 See QUEUE_STATE_MACHINE_IMPLEMENTATION.md for complete documentation
📋 See QUEUE_STATE_MACHINE_QUICK_REF.md for quick reference
The service requires the following environment variables for queue operations:
REDIS_HOST: Redis server host (default:localhost)REDIS_PORT: Redis server port (default:6379)SOROBAN_RPC_URL: primary Soroban RPC endpoint for submissionSOROBAN_RPC_FALLBACK_URLS: optional comma-separated fallback RPC endpointsRAFFLE_CONTRACT_ID: raffle contract addressNETWORK_PASSPHRASE: Stellar network passphraseTX_SUBMIT_MAX_ATTEMPTS: max tx submit attempts (default:5)TX_SUBMIT_INITIAL_BACKOFF_MS: initial backoff delay (default:1000)TX_SUBMIT_ALERT_WEBHOOK_URL: optional alert webhook for persistent submit failuresORACLE_CB_FAILURE_THRESHOLD: number of consecutive Horizon SSE failures before the circuit opens (default:5)ORACLE_CB_RESET_TIMEOUT_MS: milliseconds the circuit stays open before allowing a probe attempt (default:60000)
✅ Worker logic implemented
✅ VRF/PRNG branching
✅ Bull Queue integration (Redis-backed)
✅ Unit tests with mocks
✅ ContractService RPC calls integrated
✅ VrfService integration scaffolded
✅ TxSubmitterService builds/signs/submits receive_randomness with retry/backoff
When jobs fail after all retries, operators can use the rescue CLI for manual intervention. Mutating commands are dry-run by default and require --execute to actually apply changes.
# Dry-run re-enqueue (preview only)
npm run oracle:rescue re-enqueue <jobId> --operator <name> --reason <reason>
# Execute re-enqueue
npm run oracle:rescue re-enqueue <jobId> --operator <name> --reason <reason> --execute
# Dry-run force submit randomness
npm run oracle:rescue force-submit <raffleId> <requestId> --operator <name> --reason <reason>
# Execute force submit with an optional prize amount
npm run oracle:rescue force-submit <raffleId> <requestId> --operator <name> --reason <reason> --prize 1000 --execute
# Dry-run force fail
npm run oracle:rescue force-fail <jobId> --operator <name> --reason <reason>
# Execute force fail
npm run oracle:rescue force-fail <jobId> --operator <name> --reason <reason> --execute
# List failed jobs
npm run oracle:rescue list-failed
# View rescue audit logs
npm run oracle:rescue logsSee RESCUE_GUIDE.md for detailed usage and ON_CALL_TROUBLESHOOTING.md for on-call procedures.
- Expand integration tests against live Stellar testnet and failure modes
- Wire alert webhook to on-call paging workflow
- Add additional metrics for retry/failure counts by error class
- Harden idempotency behavior for duplicate submission races
Development (Insecure):
KEY_PROVIDER=env
ORACLE_SECRET_KEY=S... # or ORACLE_PRIVATE_KEYProduction (Secure):
# AWS KMS
KEY_PROVIDER=aws-kms
AWS_REGION=us-east-1
AWS_KMS_KEY_ID=arn:aws:kms:...
# OR Google Cloud KMS
KEY_PROVIDER=gcp-kms
GCP_PROJECT_ID=my-project
GCP_KEY_RING_ID=oracle-keys
GCP_KEY_ID=oracle-signing-key- 📖 Key Management Guide - Comprehensive setup and configuration
- 🚀 Quick Start - Get started in 5 minutes
- 🔄 Migration Guide - Migrate from env vars to HSM
- 📋 Implementation Summary - Technical details
✅ Private keys never exposed in memory
✅ All signing operations audited
✅ Centralized key management
✅ Automated key rotation
✅ Compliance with security standards