A high-performance column-store database implementation in C, featuring SIMD multi-threaded scans, batched query execution, multiple index structures, and various join algorithms.
- Column-oriented storage: Memory-mapped files with per-section statistics for section pruning. Supports INT, LONG, FLOAT, DOUBLE types. Indexes use a base (sorted) + delta (unsorted) layout for efficient updates.
- Select operator: Range queries producing multiple result formats:
- Bitmaps for dense results
- Rowid lists for sparse results
- RangeSets for contiguous ranges from clustered index hits
- Fetch operator: Gather values from columns at selected positions
- Aggregation functions: MIN, MAX, SUM, AVG and vector ADD/SUB operations
- Join algorithms: Nested-loop join, hash join, and grace hash join
- SIMD optimizations: Select, fetch, aggregation, and join all implement AVX-512 (with AVX2 and scalar fallback) for efficient data processing
- All operators utilize multi-threaded execution.
- Thread pool: Lock-free ring buffer job queue with cache-line aligned atomics. Supports two modes:
- Hybrid mode (default): Short spin-wait before blocking
- Performance mode: Pure spin-wait for lowest latency
- CPU pinning: Threads pinned to unique physical cores, handling hyperthreads and hybrid cores intelligently.
Four index types with selectivity-based optimizer:
| Index Type | Structure | Best For |
|---|---|---|
| Clustered Sorted | Sorted array with binary search | Range queries (default for selective queries) |
| Clustered B-tree | B-tree with data in leaf nodes | Tiny ranges (~1% selectivity) |
| Unclustered Sorted (USI) | Sorted keys with posting lists | Selective range queries (~5% threshold) |
| Unclustered B-tree (UBT) | B-tree storing (key, rowid) pairs | Point/equality queries (~1% threshold) |
The optimizer estimates selectivity using section statistics and chooses the path accordingly. Full scans are used when selectivity exceeds ~5%.
Performance highlights:
- Up to 33x speedup over full scans for 0.1% selectivity on 100M rows
- Nested-loop join: Cache-awared implementation with L1d-sized inner blocks for cache locality.
- Hash join: Chained hash table (bucket + next arrays) with parallel probe phase.
- Grace hash join: Two-pass partitioning algorithm for memory-constrained scenarios. Configurable memory budget (default 8GB), up to 256 partitions, reuses hash join per-partition.
cd src
make clean
make allThis produces two executables:
server: The database server processclient: Interactive client for sending queries
Start the server:
./serverIn another terminal, connect with the client:
./clientThe client accepts DSL (Domain Specific Language) queries via stdin.
Example queries:
-- Create a database and table
create(db,"mydb")
create(tbl,"mytable",mydb,4)
create(col,"col1",mydb.mytable)
create(col,"col2",mydb.mytable)
create(col,"col3",mydb.mytable)
create(col,"col4",mydb.mytable)
-- Load data from CSV
load("/path/to/data.csv")
-- Select: find positions where col1 values are in range [10, 100)
s1=select(mydb.mytable.col1,10,100)
-- Fetch: retrieve col2 values at selected positions
f1=fetch(mydb.mytable.col2,s1)
-- Aggregations
avg(f1)
sum(f1)
min(f1)
max(f1)
-- Create index on col3 (sorted clustered)
create(idx,mydb.mytable.col3,sorted,clustered)
-- Join operations
j1,j2=join(f1,p1,f2,p2,hash)
src/
├── server.c # Server main loop and socket handling
├── client.c # Client main loop
├── parse.c # DSL parser
├── db_manager.c # Query execution engine
├── mmap_store.h # Memory-mapped storage layer
├── index.h # Index structures (sorted, B-tree)
├── coalescer.h # Batched query coalescing
├── hashtable.h # Hash table for joins
└── include/
├── db_api.h # Core data structures
└── message.h # Client-server protocol
Based on experiments with up to 100M rows:
| Operation | Throughput | Notes |
|---|---|---|
| Single-threaded select | 6.7 G rows/s | Memory-bound, limited by TLB |
| Multi-threaded select | 9.5 G rows/s | ~1.5x speedup with 8 cores |
| Batched select (256 queries) | 15-20 G rows/s | Scan sharing amortizes overhead |
| Bulk load | 12.7 M rows/s | Includes CSV parsing |
Index speedups (100M rows, 0.1% selectivity):
- Clustered sorted: 31x faster than scan
- Clustered B-tree: 33x faster than scan
- Unclustered indices: 19x faster than scan
Key compile-time options in the source:
// mmap_store.h
#define ROOT_DIR "./data" // Database storage directory
// coalescer.h
#define COALESCER_FLUSH_SELECT_THRESHOLD 256 // Max batch sizeEnable debug mode at compile time, which provides detailed logs of query execution:
make CFLAGS+="-DDEBUG"There also a DEBUG TIMER mode for timing individual components:
make CFLAGS+="-DDEBUG_TIMERS"- GCC 13+ (C17 standard)
- Linux (uses mmap, pthreads)
- Python 3.x (for test data generation)
This project is archived for educational and reference purposes.