Author: froooze
Status: Implemented & Stabilized
Objective: Eliminate optimistic state corruption by separating "Blockchain Truth" from "Strategy Targets" using immutable master grids and Copy-on-Write semantics.
The Copy-on-Write (COW) Grid Architecture replaces the old optimistic mutation pattern with a cleaner approach: master grid is never modified until blockchain confirmation.
This architecture implements the core philosophy of "Verify, Then Commit":
- Immutable Master Grid: The master grid is never directly modified during planning
- Atomic Promotion: Changes only move from working copy to master upon verified blockchain success
- Delta-Only Execution: Only the difference between Master and Target triggers blockchain actions
- Side Invariance: Order side (BUY/SELL) is absolute. Any price-flip requires a full
Cancel→Placesequence
Also known as: "Old Pattern"
1. Modify master grid directly (optimistic)
2. Broadcast to blockchain
3. (No recovery mechanism on failure)
Problems:
- Direct mutation of master grid during planning
- No isolation between planning and committed state
- No snapshot/rollback capability - failures leave grid in corrupted state
- Ghost orders persist because nothing cleans them up (see Incident Report: XRP-BTS Price Jump)
1. Create WorkingGrid (clone of master)
2. Modify working copy
3. Broadcast to blockchain
4. On success: atomic swap (working → master)
5. On failure: discard working (master unchanged)
The COW architecture evolved from earlier attempts to achieve grid immutability:
- Approach: Direct in-memory mutation of master grid during planning
- Pattern: Modify master directly → Broadcast to blockchain → No recovery mechanism
- Vulnerability: State corruption during any failure, no isolation between planning and committed state, no rollback capability
- Incident: This approach caused the XRP-BTS Price Jump incident — a sudden market move corrupted in-flight grid state because planning mutations were applied directly to the master grid, with no isolation or rollback
- Status: ❌ Vulnerable - Replaced by frozen master approach
- Approach:
Object.freeze()on Maps and order objects - Implementation: Each
_applyOrderUpdatecreates new frozen Map via immutable-swap pattern - Original concern: Performance overhead, complexity in deep-freezing nested structures
- Advantage: Runtime enforcement prevents accidental mutations, catches bugs that read
manager.ordersand mutate in-place - Status: ✅ Retained as defense-in-depth layer alongside COW
- Approach: Working copy during planning, atomic swap on blockchain confirmation
- Pattern: Clone → Modify working copy → Broadcast → Commit on success / Discard on failure
- Advantage: True transactional semantics, master never in intermediate state, cleaner than snapshot/rollback
- Status: ✅ Production-ready
Actual Implementation: Freeze + COW Hybrid
The production implementation uses BOTH approaches as complementary layers:
-
Object.freeze()on the master Map anddeepFreeze()on individual order objects provides runtime enforcement -- any accidental direct mutation throws in strict mode. Each_applyOrderUpdatecall creates a new frozen Map via immutable-swap pattern:const newMap = cloneMap(this.orders); newMap.set(id, deepFreeze({ ...nextOrder })); this.orders = Object.freeze(newMap);
-
COW working grids provide the planning/broadcast lifecycle isolation -- strategy computes target state on a mutable clone, and master is only replaced after blockchain confirmation via
_commitWorkingGrid().
The freeze layer catches bugs that COW alone wouldn't (e.g., code that reads manager.orders
and mutates an order object in-place). The COW layer provides the transactional semantics
(plan, broadcast, commit-or-discard).
- Created
modules/order/utils/order_comparison.js- Epsilon-based order comparison - Created
modules/order/utils/grid_indexes.js- Index building utilities (used by WorkingGrid)
- Created
modules/order/working_grid.js- WorkingGrid class - Added
COW_PERFORMANCEthresholds tomodules/constants.js
performSafeRebalance()→ delegates to_applySafeRebalanceCOW()_applySafeRebalanceCOW()- Creates working grid, runs planning, returns result without modifying master_reconcileGridCOW()- Delta reconciliation against working copy_commitWorkingGrid()- Atomic swap from working to master
updateOrdersOnChainBatch()- Routes to COW path whenworkingGridpresent_updateOrdersOnChainBatchCOW()- Full COW broadcast with commit on success- Removed legacy rollback code
Updated Decision: "Selective abort - Continue individual fills, block full-side updates"
Fill Processing Behavior:
| Scenario | Scope | Action | Reason |
|---|---|---|---|
| Individual fill | Single slot (boundary shift) | Process immediately | Just moves boundary, doesn't modify filled order. Low risk. |
| Divergence-triggered full update | Entire side of grid | BLOCK if fills pending | Rebuilding all orders - needs stable state, no concurrent fills |
| Cache threshold full update | Entire side of grid | BLOCK if fills pending | Rebalancing depleted side - complex planning, can't have stale data |
Key Principle:
Individual Fills (Grid Maintenance):
- Only move boundary and handle next slot - don't modify the filled order itself
- Fills keep the grid alive by shifting the boundary as market moves
- Low risk, can sync to working grid and continue current rebalance
- Never abort blockchain operations for individual fills
Full Side Updates (Major Planning):
- Divergence: Rebuild entire side of grid (potentially dozens of orders)
- Cache threshold: Rebalance all orders on depleted side
- High risk - complex planning that shouldn't run with stale state
- Abort if fills pending or during BROADCASTING
Why the distinction matters:
- Individual fills: "Just handle this one slot, keep grid alive"
- Full updates: "Rebuild everything, needs stable foundation"
Critical: Working Grid Synchronization When fills arrive during REBALANCING state (before BROADCASTING):
- Update master grid immediately (blockchain truth)
- Also apply same fill to working grid (keep copies in sync)
- Continue with current rebalance using updated working copy
// Implementation in _applyOrderUpdate (manager.js)
async _applyOrderUpdate(order, context, options = {}) {
// ... update master grid (immutable swap) ...
// Centralized adapter handles:
// 1) planning-state gate,
// 2) markStale(),
// 3) syncFromMaster(),
// 4) sync error handling.
this._syncWorkingGridFromMasterMutation(order.id, context);
}
// WorkingGrid.syncFromMaster (working_grid.js)
syncFromMaster(masterGrid, orderId, masterVersion) {
const masterOrder = masterGrid.get(orderId);
if (masterOrder) {
this.grid.set(orderId, this._cloneOrder(masterOrder));
this.modified.add(orderId);
this._indexes = null;
}
if (Number.isFinite(masterVersion)) {
this.baseVersion = masterVersion;
}
}Why this matters:
- Working grid must reflect all blockchain state changes
- Prevents stale data from being committed
- Avoids unnecessary aborts for individual fills
Fill Processing Flow:
NEW FILL ARRIVES (Individual Order Fill)
│
▼
[What type of update triggered this?]
│
Boundary Shift Only? ──Yes──> Update master grid
│ Sync to working grid (if planning active)
No Continue current operations
│
Full Side Update? ──Yes──> Check State
│ │
No REBALANCING?
│ │
│ Yes ──> Block, wait for fills=0
│ │
│ No ──> Check if BROADCASTING
│ │
│ Yes ──> AbortController.abort()
│ │
│ No ──> Safe to proceed
▼
[Note: Individual fills move boundary, don't touch filled order]
[Note: Full updates touch all orders - needs stable state]
| State | Fill Handling | Working Grid Action | Commit Outcome |
|---|---|---|---|
NORMAL |
Apply fill to master immediately | No working-grid sync | Not applicable |
REBALANCING |
Apply fill to master immediately | Mark stale + sync changed order from master | Planning returns aborted result |
BROADCASTING |
Apply fill to master immediately | Mark stale + sync changed order from master | Commit guard rejects stale/version-mismatch grid |
Single-source rules (no special cases):
- Master grid always updates first (blockchain truth)
- If planning/broadcasting is active, sync the changed order into the working grid
- Any master mutation marks the working grid stale
- Commit succeeds only when stale/version/delta guards all pass
tests/test_cow_master_plan.js- 11 COW core teststests/test_cow_commit_guards.js- Commit guard regression teststests/test_cow_concurrent_fills.js- Concurrent fill integration teststests/test_sync_lock_routing.js- Lock routing verification teststests/test_working_grid.js- WorkingGrid unit teststests/benchmark_cow.js- Performance benchmarks
Critical Rule: Divergence checks and cache function updates only execute when NO fills are pending.
Execution Conditions:
if (fills.length === 0 && rebalanceState === REBALANCE_STATES.NORMAL) {
// Safe to check divergence
// Safe to update cache functions
}Why this restriction:
- Divergence calculations assume stable grid state
- Fills modify the grid mid-calculation
- Cache updates must reflect committed state, not speculative working state
- Prevents race conditions between fill processing and cache invalidation
Migrated applyGridDivergenceCorrections from queue-based cancellations to full COW pattern.
Atomic Boundary Shifts (Patch 20): Boundary index changes during divergence correction are now atomic with slot-type reassignment. The pendingBoundaryIdx variable carries boundary changes through the COW pipeline without touching manager.boundaryIdx until _commitWorkingGrid completes. This prevents temporary mismatches between boundary position and slot BUY/SELL roles during blockchain execution.
// Boundary changes flow through COW pipeline atomically
const boundarySync = syncBoundaryToFunds(manager); // Returns { changed, newIdx }
if (boundarySync.changed) {
pendingBoundaryIdx = boundarySync.newIdx; // NOT manager.boundaryIdx!
// updateGridFromBlockchainSnapshot reassigns slot types in WorkingGrid
// manager.boundaryIdx updated atomically in _commitWorkingGrid
}Before (Queue-Based):
// Detect divergence → Queue corrections → Execute batch → Clear queue
// Master grid stays ACTIVE during entire process (race condition)
ordersNeedingPriceCorrection.push({ gridOrder, chainOrderId, isSurplus: true });
// ...later...
await updateOrdersOnChainBatchFn({ ordersToCancel, ordersToPlace, ordersToRotate });After (COW-Based):
// Detect divergence → Create WorkingGrid → Update sizes in working copy
// → Execute UPDATE/CANCEL/CREATE ops on chain → Commit working grid on success
const workingGrid = new WorkingGrid(manager.orders);
workingGrid.set(orderId, convertToSpreadPlaceholder(order)); // Surplus → virtual slot
const actions = [{ type: COW_ACTIONS.CANCEL, id, orderId }, ...];
const cowResult = { actions, workingGrid, ... };
await updateOrdersOnChainBatch(cowResult); // Commit only on successKey Changes:
- Surplus orders: CANCEL on-chain and virtualize in working grid
- State preservation: ACTIVE/PARTIAL orders keep their state in working grid
- No race conditions: Master unchanged until blockchain confirms
- Unified flow: Same COW pattern as fill rebalancing
Grid Resizing Also Migrated:
updateGridFromBlockchainSnapshot now returns COW result:
// Before: Modified master grid directly
await Grid.updateGridFromBlockchainSnapshot(manager, 'buy'); // Direct update!
// After: Returns COW result for batch execution
const cowResult = await Grid.updateGridFromBlockchainSnapshot(manager, 'buy');
await updateOrdersOnChainBatch(cowResult); // Execute via COW- 100 orders: ~0.03ms clone
- 500 orders: ~0.05ms clone
- 1000 orders: ~0.08ms clone
- 5000 orders: ~0.5ms clone
- Removed snapshot/rollback pattern;
performSafeRebalance()now delegates to_applySafeRebalanceCOW() - Removed duplicate
_updateOrdersOnChainBatchCOW - Removed legacy rollback references in
dexbot_class.js
| Method | Description |
|---|---|
performSafeRebalance(fills, excludeIds) |
Entry point - delegates to COW |
_applySafeRebalanceCOW(fills, excludeIds) |
Creates working grid, runs planning |
_reconcileGridCOW(targetGrid, boundary, workingGrid) |
Delta against working copy |
_commitWorkingGrid(workingGrid, indexes, boundary, options = {}) |
Atomic swap to master |
_setRebalanceState(state) |
Track rebalance state |
_currentWorkingGrid |
Reference to working grid during rebalance for fill sync |
syncFromMaster(masterGrid, orderId) |
Sync specific order from master to working grid (WorkingGrid method) |
| Method | Description |
|---|---|
updateOrdersOnChainBatch(rebalanceResult) |
Routes to COW broadcast |
_updateOrdersOnChainBatchCOW(rebalanceResult) |
Full COW broadcast with commit |
| Method | Description |
|---|---|
syncFromMaster(masterGrid, orderId) |
Sync specific order from master to working grid during fill processing |
buildDelta(masterGrid) |
Build delta actions between master and working grid |
getIndexes() |
Get cached grid indexes |
NORMAL → REBALANCING → BROADCASTING → _commitWorkingGrid() → NORMAL
↓ (on failure)
_clearWorkingGridRef() → NORMAL
(master unchanged)
State transitions:
NORMAL → REBALANCING:_applySafeRebalanceCOW()begins planningREBALANCING → BROADCASTING:_updateOrdersOnChainBatchCOW()starts blockchain opsBROADCASTING → NORMAL:_clearWorkingGridRef()always called on exit (success or failure)- Fill during REBALANCING/BROADCASTING: marks working grid stale, syncs data
1. performSafeRebalance(fills, excludeIds)
└─> _applySafeRebalanceCOW()
├─> Create WorkingGrid (clone master)
├─> Calculate target grid (from strategy)
├─> Reconcile against working copy
├─> Validate working grid funds
└─> Return { workingGrid, actions, ... }
2. updateOrdersOnChainBatch(result)
└─> _updateOrdersOnChainBatchCOW()
├─> Lock order IDs
├─> Build blockchain operations
├─> Execute batch
├─> On success:
│ └─> _commitWorkingGrid() → atomic swap
│ └─> persistGrid() → write to disk
└─> On failure:
└─> workingGrid discarded (master unchanged)
NEW FILL ARRIVES
│
▼
[What type of update?]
│
Individual Fill? ──Yes───────────────────────┐
│ │
No │
│ │
Full Side Update? ──Yes──> [Check State] │
│ │ │
No BROADCASTING? │
│ │ │
│ Yes ──> Abort │
│ No ──> Block │
│ │ │
│ Working grid │
│ discarded │
│ │ │
└──────────────────────────┴──────────────┘
│
▼
Master grid updated
(fills processed)
│
▼
Continue blockchain ops
OR trigger new rebalance
modules/order/utils/order_comparison.js- Epsilon-based order comparison, delta buildingmodules/order/utils/grid_indexes.js- Index building and validation utilitiesmodules/order/working_grid.js- WorkingGrid class (COW wrapper with clone/delta/stale tracking)tests/test_cow_master_plan.js- Core COW teststests/test_cow_commit_guards.js- Commit guard regression teststests/test_cow_concurrent_fills.js- Concurrent fill integration teststests/test_cow_divergence_correction.js- Divergence correction COW teststests/test_working_grid.js- WorkingGrid unit teststests/benchmark_cow.js- Performance benchmarks
modules/constants.js- Added COW_PERFORMANCE thresholdsmodules/order/manager.js- Added COW methods, immutable master (Object.freeze), version trackingmodules/dexbot_class.js- Wired COW broadcast, removed legacy rollbackmodules/order/sync_engine.js- Uses_applyOrderUpdate(lock-free) for all sync pathsmodules/order/startup_reconcile.js- Uses_applySync(lock-free) when inside_gridLockmodules/order/utils/system.js- MigratedapplyGridDivergenceCorrectionsto full COW patternmodules/order/grid.js- MigratedupdateGridFromBlockchainSnapshotto return COW result instead of modifying master directly
Core COW Tests (test_cow_master_plan.js):
✓ COW-001: Master unchanged on failure
✓ COW-002: Master updated only on success
✓ COW-003: Index transfer
✓ COW-004: Fund recalculation
✓ COW-005: Order comparison
✓ COW-006: Delta building
✓ COW-007: Index validation
✓ COW-008: Working grid independence
✓ COW-009: Empty grid handling
✓ COW-010: Memory stats
✓ COW-011: No spurious updates on unchanged grid
Commit Guard Tests (test_cow_commit_guards.js):
✓ COW-COMMIT-001: Version mismatch rejection
✓ COW-COMMIT-002: Empty delta rejection
Concurrent Fill Tests (test_cow_concurrent_fills.js):
✓ COW-FILL-001: Fill during REBALANCING syncs to working grid
✓ COW-FILL-002: Fill during BROADCASTING syncs to working grid
✓ COW-FILL-003: Commit rejected after fill during broadcast
✓ COW-FILL-004: No working grid sync during NORMAL state
✓ COW-FILL-005: _cloneOrder deep-clones rawOnChain
✓ COW-FILL-006: _cloneOrder handles missing rawOnChain
✓ COW-FILL-007: Staleness reason includes phase context
Divergence Correction Tests (test_cow_divergence_correction.js):
✓ Surplus orders are CANCELLED (not UPDATE to size=0)
✓ Working grid preserves order states (ACTIVE, PARTIAL)
✓ Orders within target window get size updates
✓ No duplicate UPDATE/CANCEL overlap for same order
Filled orders are blockchain truth and always processed immediately.
Individual Fills (Single Order):
- Only shift boundary index - filled order becomes SPREAD, next slot gets filled
- Does NOT modify the filled order itself
- Low impact - just keeps grid aligned with market
- Always process immediately, sync to working grid, continue operations
Full Side Updates (All Orders):
- Divergence: Recalculate and update ALL orders on one side
- Cache threshold: Rebalance entire depleted side
- High impact - complex planning across many orders
- BLOCK if fills pending - can't plan with stale state
Key distinction: Individual fills move the boundary. Full updates rebuild everything.
Divergence detection and cache function updates are deferred when fills are pending or when rebalancing is in progress to ensure calculations use stable, committed state rather than speculative working grid state.
Run these tests before promotion:
node tests/test_engine_integration.jsnode tests/test_sequential_multi_fill.jsnode tests/test_sync_logic.jsnode tests/test_ghost_order_fix.jsnode tests/test_working_grid.jsnode tests/test_cow_master_plan.jsnode tests/test_cow_commit_guards.jsnode tests/test_cow_concurrent_fills.jsnode tests/test_cow_divergence_correction.js
- Unchanged grids do not emit global COW
updateactions - Missing on-chain ACTIVE order with
orderIdappears infilledOrdersfrom open-order sync
COW_PERFORMANCE.MAX_REBALANCE_PLANNING_MS- Max time for rebalance planning phaseCOW_PERFORMANCE.MAX_COMMIT_MS- Max time for grid commit operationCOW_PERFORMANCE.MAX_MEMORY_MB- Memory threshold for working grid operationsCOW_PERFORMANCE.INDEX_REBUILD_THRESHOLD- Grid size threshold for index rebuildingCOW_PERFORMANCE.WORKING_GRID_BYTES_PER_ORDER- Estimated memory per order (500 bytes)
PIPELINE_TIMING.MAX_FEE_EVENT_CACHE_SIZE- LRU cache limit for fee dedup (10,000 entries)PIPELINE_TIMING.FEE_EVENT_DEDUP_TTL_MS- Fee event deduplication TTL (6 hours)PIPELINE_TIMING.CACHE_EVICTION_RETENTION_RATIO- LRU eviction retention (0.75)PIPELINE_TIMING.RECOVERY_DECAY_FALLBACK_MS- Recovery decay fallback (180 seconds)
GRID_LIMITS.RELATIVE_ORDER_UPDATE_THRESHOLD_PERCENT- Relative threshold (%) for in-memory COW order equality checksGRID_LIMITS.STATE_CHANGE_HISTORY_MAX- Max state change history entries (100)TIMING.LOCK_REFRESH_MIN_MS- Minimum lock refresh interval (250ms)
- Fee-event dedupe keys now quantize size with
floatToBlockchainInt(size, orderPrecision)(derived from BUY/SELL side precision) - Fixed
1e8satoshi conversion is no longer used
-
Accountant Dry-Run:
Accountant.validateTargetGrid(targetMap)verifies that the entire proposed grid fits withinLiquid + CurrentOrderValuebefore broadcasting. -
Atomic Transaction Semantics: Large boundary shifts (>5 slots) are inherently safe because the COW pattern only commits after successful blockchain confirmation. If market volatility causes rapid shifts during planning, the working grid is simply discarded and replanning occurs on the next cycle.
-
Resync on Error: If any blockchain action fails (e.g., "Insufficient funds"), the bot discards the working grid and triggers
startup_reconcile.jsfor a fresh blockchain sync.
None. COW is the only standard. The old snapshot/rollback pattern has been completely removed.
This architecture makes the "Metadata Reinterpretation" bug impossible by ensuring that memory is only a reflection of verified blockchain state. The master grid is never partially modified - it's either the old state or the new state, with no intermediate "limbo" states.