Exchanges typically don't provide the complete order book directly. Most retail APIs only give you the top few prices or limited market depth. However, professional trading teams need access to every individual order in real time. Reconstructing the full order book locally using raw Market-by-Order (MBO) data ensures your slippage models remain just behind your trading engine, significantly faster than depending on external data streams.
flowchart TD A[Raw ITCH CSV] --> B[Tiny zero-copy parser] B --> C["Order-by-ID hash + bid/ask price maps"] C --> D["Update book state - A M C R logic"] D --> E["Emit MBP-10 snapshot - timestamp + 40 fields"] Technical Details & design decisions
This implementation processes each input line independently and generates a single snapshot line immediately. It avoids buffering, batching, or backfills. This design keeps correctness straightforward and ensures consistent, low latency.
| Requirement | Chosen Data Structure | Reasoning / Explanation |
|---|---|---|
| Order ID Lookup (cancels/modifies) | unordered_map<OrderId, Order> |
Constant-time lookups; minimal memory overhead. |
| Sorted Best Price Access | std::map<Price, AggLevel> |
Balanced tree with efficient log-time operations. Easy best-price access via begin(). |
| Memory Usage | One entry per open order and price level | Lightweight footprint, usually < 100 MB even for busy NASDAQ symbols. |
Why not heaps or vectors?
Maintaining heap order after frequent updates is slower due to reordering overhead. Balanced trees (std::map) are naturally optimized and faster in practice, especially given frequent updates at price levels.
switch (action) {
case 'A': book.add(id, side, px, sz); break;
case 'C': book.cancel(id, sz); break;
case 'M': book.modify(id, px, sz); break;
case 'R': book.clear(); break;
default: /* T, F, N */ break;
}| Action | Behavior |
|---|---|
A |
Insert a new order and update price level. |
C |
Reduce order size, remove price level if empty. |
M |
Execute as cancel + add (preserves priority). |
R |
Clear entire state (e.g., after halts). |
| Others | Ignored; not relevant to current snapshot. |
Snapshot generation involves simply iterating through the top 10 entries of each side (bids and asks) and outputting their prices and sizes. Due to caching, this approach is efficient enough not to require additional optimizations, such as maintaining a precomputed top-10 list, unless profiling specifically indicates otherwise.
ts_event,bid_px_00,bid_sz_00,...,bid_px_09,bid_sz_09,ask_px_00,ask_sz_00,...,ask_px_09,ask_sz_09
Custom-built parser uses a single-pass approach with std::string_view. No allocations or string copies—highly efficient. Assumes input data without quoted fields (valid for ITCH-like market data feeds).
Performance: Achieves roughly 1.5 GB/s parsing throughput on optimized compiler settings (-O3 -march=native), comfortably exceeding typical disk-read speeds.
make # Requires g++-17 with -O3 optimization, optional LTO
./reconstruction mbo.csv > mbp.csv