Skip to content

Commit

Permalink
fix(doc): fix load tpcds tpch document (#17438)
Browse files Browse the repository at this point in the history
* fix(doc): fix load tpcds tpch document

* fix(doc): fix load tpcds tpch document

* fix(doc): fix load tpcds tpch document

* fix(doc): fix load tpcds tpch document
  • Loading branch information
sundy-li authored Feb 11, 2025
1 parent 7b06ae6 commit 17f4491
Show file tree
Hide file tree
Showing 6 changed files with 54 additions and 53 deletions.
21 changes: 3 additions & 18 deletions benchmark/tpcds/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,27 +2,12 @@

## Preparing the Table and Data

We use [DuckDB](https://github.com/duckdb/duckdb) to generate TPC-DS data.
We use [DuckDB](https://duckdb.org/docs/installation/) to generate TPC-DS data.

After installing DuckDB, you can use these commands to generate the data ([more information](https://github.com/duckdb/duckdb/tree/master/extension/tpcds)):

```sql
INSTALL tpcds;
LOAD tpcds;
SELECT * FROM dsdgen(sf=0.01) -- sf can be other values, such as 0.1, 1, 10, ...
EXPORT DATABASE '/tmp/tpcds_0_01/' (FORMAT CSV, DELIMITER '|');
```

Then, move the data to current directory:

```shell
mv /tmp/tpcds_0_01/ "$(pwd)/data/"
```

After that, you can load data to Databend:

```shell
./load_data.sh
./load_data.sh 0.1
```

## Benchmark
Expand All @@ -32,5 +17,5 @@ To run the TPC-DS Benchmark, first build `databend-sqllogictests` binary.
Then, execute the following command in your shell:

```shell
databend-sqllogictests --handlers mysql --database tpcds --run_dir tpcds --bench
databend-sqllogictests --handlers mysql --database tpcds --run_dir tpcds --bench
```
60 changes: 36 additions & 24 deletions benchmark/tpcds/load_data.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,33 +3,44 @@
CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
. "$CURDIR"/shell_env.sh

factor=$1

echo """
INSTALL tpcds;
LOAD tpcds;
SELECT * FROM dsdgen(sf=$factor); -- sf can be other values, such as 0.1, 1, 10, ...
EXPORT DATABASE '/tmp/tpcds_$factor/' (FORMAT CSV, DELIMITER '|');
""" | duckdb

mv /tmp/tpcds_$factor/ "$(pwd)/data/"

# Create Database
echo "CREATE DATABASE IF NOT EXISTS ${MYSQL_DATABASE}" | $BENDSQL_CLIENT_CONNECT_DEFAULT

tables=(
call_center
catalog_returns
customer_address
customer_demographics
household_demographics
inventory
promotion
ship_mode
store_returns
time_dim
web_page
call_center
catalog_returns
customer_address
customer_demographics
household_demographics
inventory
promotion
ship_mode
store_returns
time_dim
web_page
web_sales
catalog_page
catalog_sales
customer
date_dim
income_band
item
reason
store
store_sales
warehouse
web_returns
catalog_page
catalog_sales
customer
date_dim
income_band
item
reason
store
store_sales
warehouse
web_returns
web_site
)

Expand All @@ -43,11 +54,12 @@ done
cat "$CURDIR"/tpcds.sql | $BENDSQL_CLIENT_CONNECT

# Load Data
# note: export STORAGE_ALLOW_INSECURE=true to start databend-query
for t in ${tables[@]}
do
echo "$t"
insert_sql="insert into $MYSQL_DATABASE.$t file_format = (type = CSV skip_header = 0 field_delimiter = '|' record_delimiter = '\n')"
curl -s -u root: -XPUT "http://localhost:8000/v1/streaming_load" -H "database: tpcds" -H "insert_sql: ${insert_sql}" -F 'upload=@"'${CURDIR}'/data/'$t'.csv"' > /dev/null 2>&1
fp="`pwd`/data/$t.csv"
echo "copy into ${MYSQL_DATABASE}.$t from 'fs://${fp}' file_format = (type = CSV skip_header = 1 field_delimiter = '|' record_delimiter = '\n')" | $BENDSQL_CLIENT_CONNECT
done


1 change: 1 addition & 0 deletions benchmark/tpch/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

## Preparing the Table and Data

We use [DuckDB](https://duckdb.org/docs/installation/) to generate TPCH data.
To prepare the table and data for the TPC-H Benchmark, run the following command in your shell:

```shell
Expand Down
5 changes: 0 additions & 5 deletions benchmark/tpch/gen_data.sh

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -111,9 +111,10 @@ echo "CREATE TABLE IF NOT EXISTS lineitem
) CLUSTER BY(l_shipdate, l_orderkey) ${options}" | $BENDSQL_CLIENT_CONNECT

# insert data to tables
# note: export STORAGE_ALLOW_INSECURE=true to start databend-query
for t in customer lineitem nation orders partsupp part region supplier
do
echo "$t"
insert_sql="insert into ${MYSQL_DATABASE}.$t file_format = (type = CSV skip_header = 0 field_delimiter = '|' record_delimiter = '\n')"
curl -s -u root: -XPUT "http://localhost:${QUERY_HTTP_HANDLER_PORT}/v1/streaming_load" -H "database: tpch" -H "insert_sql: ${insert_sql}" -F 'upload=@"./data/'$t'.tbl"'
fp="`pwd`/data/$t.tbl"
echo "copy into ${MYSQL_DATABASE}.$t from 'fs://${fp}' file_format = (type = CSV skip_header = 1 field_delimiter = '|' record_delimiter = '\n')" | $BENDSQL_CLIENT_CONNECT
done
15 changes: 11 additions & 4 deletions benchmark/tpch/tpch.sh
Original file line number Diff line number Diff line change
@@ -1,13 +1,20 @@
#!/usr/bin/env bash

# generate tpch data
sh ./gen_data.sh $1

echo """
INSTALL tpch;
LOAD tpch;
SELECT * FROM dsdgen(sf=1); -- sf can be other values, such as 0.1, 1, 10, ...
EXPORT DATABASE '/tmp/tpch_1/' (FORMAT CSV, DELIMITER '|');
""" | duckdb

mv /tmp/tpch_1/ "$(pwd)/data/"

if [[ $2 == native ]]; then
echo "native"
sh ./prepare_table.sh "storage_format = 'native' compression = 'lz4'"
sh ./load_data.sh "storage_format = 'native' compression = 'lz4'"
else
echo "fuse"
sh ./prepare_table.sh ""
sh ./load_data.sh ""
fi

0 comments on commit 17f4491

Please sign in to comment.