Skip to content

TPC-H benchmark data generation in pure Rust

License

Notifications You must be signed in to change notification settings

GlareDB/tpchgen-rs

 
 

Repository files navigation

tpchgen-rs

Apache licensed Build Status

Blazing fast TPCH benchmark data generator, in pure Rust with zero dependencies.

Features

  1. Blazing Speed 🚀
  2. Obsessively Tested 📋
  3. Fully parallel, streaming, reasonable memory usage 🧠

Try now!

First install Rust and this tool:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
cargo install tpchgen-cli
# create Scale Factor 10 (3.6GB, 8 files, 60M rows in lineitem) in 5 seconds on a modern laptop
tpchgen-cli -s 10 --format=parquet

Performance

Parquet Generation Performance

tpchgen-cli is more than 10x faster than any other TPCH generator we know of. On a 2023 Mac M3 Max laptop, it easily generates data faster than can be written to SSD. See BENCHMARKS.md for more details on performance and benchmarking.

Testing

This crate has extensive tests to ensure correctness. We compare the output of this crate with the original dbgen implementation as part of every checkin. See TESTING.md for more details.

Crates

  • tpchgen is the library that implements the data generation logic for TPCH and it can be used to embed data generation logic natively in Rust.

tpchgen-arrow is a library for generating in memory Apache Arrow

record batches for each of the TPCH tables

  • tpchgen-cli is a dbgen compatible CLI tool that generates tables from the TPCH benchmark dataset.

Contributing

Pull requests are welcome. For major changes, please open an issue first for discussion. See our contributors guide for more details.

Architecture

Please see architecture guide for details on how the code is structured.

License

The project is licensed under the APACHE 2.0 license.

References

  • The TPC-H Specification, see the specification page.
  • The Original dbgen Implementation you must submit an official request to access the software dbgen at their official website

About

TPC-H benchmark data generation in pure Rust

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Rust 97.4%
  • Shell 2.6%