Distributed SQL database in Rust, built from scratch as an educational project. Main features:
-
Raft distributed consensus engine for linearizable state machine replication.
-
ACID transaction engine with MVCC-based snapshot isolation.
-
Pluggable storage engine with BitCask and in-memory backends.
-
Iterator-based query engine with heuristic optimization and time-travel support.
-
SQL interface including joins, aggregates, and transactions.
Originally written to learn more about database internals, toyDB is intended to illustrate the basic architecture and concepts of distributed SQL databases. It focuses on simplicity and understandability, and should be functional and correct. Other aspects like performance, scalability, and availability are explicit non-goals -- these are major sources of complexity in production-grade databases, which obscure the basic underlying concepts. Shortcuts have been taken where possible.
-
Architecture guide: overview of toyDB's architecture and implementation.
-
SQL examples: walkthrough of toyDB's SQL features.
-
SQL reference: toyDB SQL reference documentation.
-
References: books and other materials used while building toyDB.
With a Rust compiler installed, a local five-node cluster can be built and started as:
$ ./cluster/run.sh
Starting 5 nodes on ports 9601-9605 with data under cluster/*/data/.
To connect to node 5, run: cargo run --release --bin toysql
toydb4 21:03:55 [INFO] Listening on [::1]:9604 (SQL) and [::1]:9704 (Raft)
toydb1 21:03:55 [INFO] Listening on [::1]:9601 (SQL) and [::1]:9701 (Raft)
toydb2 21:03:55 [INFO] Listening on [::1]:9602 (SQL) and [::1]:9702 (Raft)
toydb3 21:03:55 [INFO] Listening on [::1]:9603 (SQL) and [::1]:9703 (Raft)
toydb5 21:03:55 [INFO] Listening on [::1]:9605 (SQL) and [::1]:9705 (Raft)
toydb2 21:03:56 [INFO] Starting new election for term 1
[...]
toydb2 21:03:56 [INFO] Won election for term 1, becoming leader
A command-line client can be built and used with node 5 on localhost:9605
:
$ cargo run --release --bin toysql
Connected to toyDB node n5. Enter !help for instructions.
toydb> CREATE TABLE movies (id INTEGER PRIMARY KEY, title VARCHAR NOT NULL);
toydb> INSERT INTO movies VALUES (1, 'Sicario'), (2, 'Stalker'), (3, 'Her');
toydb> SELECT * FROM movies;
1, 'Sicario'
2, 'Stalker'
3, 'Her'
toyDB supports most common SQL features, including joins, aggregates, and transactions.
Below is an EXPLAIN
query plan of a more complex query, fetching movies from studios that have
released movies with an IMDb rating of 8 or more:
toydb> EXPLAIN SELECT m.title, g.name AS genre, s.name AS studio, m.rating
FROM movies m JOIN genres g ON m.genre_id = g.id,
studios s JOIN movies good ON good.studio_id = s.id AND good.rating >= 8
WHERE m.studio_id = s.id
GROUP BY m.title, g.name, s.name, m.rating, m.released
ORDER BY m.rating DESC, m.released ASC, m.title ASC;
Remap: m.title, genre, studio, m.rating (dropped: m.released)
└─ Order: m.rating desc, m.released asc, m.title asc
└─ Projection: m.title, g.name as genre, s.name as studio, m.rating, m.released
└─ Aggregate: m.title, g.name, s.name, m.rating, m.released
└─ HashJoin: inner on m.studio_id = s.id
├─ HashJoin: inner on m.genre_id = g.id
│ ├─ Scan: movies as m
│ └─ Scan: genres as g
└─ HashJoin: inner on s.id = good.studio_id
├─ Scan: studios as s
└─ Scan: movies as good (good.rating > 8 OR good.rating = 8)
toyDB's architecture is fairly typical for a distributed SQL database: a transactional key/value store managed by a Raft cluster with a SQL query engine on top. See the architecture guide for more details.
toyDB mainly uses Goldenscripts for tests. These script various scenarios, capture events and output, and later assert that the behavior remains the same. See e.g.:
Run tests with cargo test
, or have a look at the latest
CI run.
toyDB is not optimized for performance, but comes with a workload
benchmark tool that can run
various workloads against a toyDB cluster. For example:
# Start a 5-node toyDB cluster.
$ ./cluster/run.sh
[...]
# Run a read-only benchmark via all 5 nodes.
$ cargo run --release --bin workload read
Preparing initial dataset... done (0.179s)
Spawning 16 workers... done (0.006s)
Running workload read (rows=1000 size=64 batch=1)...
Time Progress Txns Rate p50 p90 p99 pMax
1.0s 13.1% 13085 13020/s 1.3ms 1.5ms 1.9ms 8.4ms
2.0s 27.2% 27183 13524/s 1.3ms 1.5ms 1.8ms 8.4ms
3.0s 41.3% 41301 13702/s 1.2ms 1.5ms 1.8ms 8.4ms
4.0s 55.3% 55340 13769/s 1.2ms 1.5ms 1.8ms 8.4ms
5.0s 70.0% 70015 13936/s 1.2ms 1.5ms 1.8ms 8.4ms
6.0s 84.7% 84663 14047/s 1.2ms 1.4ms 1.8ms 8.4ms
7.0s 99.6% 99571 14166/s 1.2ms 1.4ms 1.7ms 8.4ms
7.1s 100.0% 100000 14163/s 1.2ms 1.4ms 1.7ms 8.4ms
Verifying dataset... done (0.002s)
The available workloads are:
read
: single-row primary key lookups.write
: single-row inserts to sequential primary keys.bank
: bank transfers between various customers and accounts. To make things interesting, this includes joins, secondary indexes, sorting, and conflicts.
For more information about workloads and parameters, run cargo run --bin workload -- --help
.
Example workload results are listed below. Write performance is pretty atrocious, due to fsyncs and a lack of write batching at the Raft level. Disabling fsyncs, or using the in-memory engine, significantly improves write performance.
Workload | BitCask | BitCask w/o fsync | Memory |
---|---|---|---|
read |
14163 txn/s | 13941 txn/s | 13949 txn/s |
write |
35 txn/s | 4719 txn/s | 7781 txn/s |
bank |
21 txn/s | 1120 txn/s | 1346 txn/s |
VSCode provides an intuitive environment for debugging toyDB.
The debug configuration is included under .vscode/launch.json
, to use it:
-
Install the CodeLLDB extension.
-
Go to the "Run and Debug" tab and select e.g. "Debug unit tests in library 'toydb'".
-
To debug the binary, select "Debug executable 'toydb'" under "Run and Debug".
toyDB logo is courtesy of @jonasmerlin.