Running nexmark benchmarks. #6661
-
I notice that there is an implementation of the nexmark benchmark in the source tree, along with a data generator for it. Unfortunately I couldn't find a recipe for running the benchmark in the documentation. I am able to run end-to-end tests, including nexmark tests, but that's just for testing correctness. I am interested in measuring the performance of the benchmarks. Specifically, I would like to run nexmark in the dev environment on my laptop. Any pointers would be greatly appreciated. |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 15 replies
-
I think @KeXiangWang may be the right person to answer this question. |
Beta Was this translation helpful? Give feedback.
-
Thanks for your question. Yes, we have an implementation for nexmark benchmark. You can find the SQLs to create sources here. And the SQLs to create MV for nexmark queries here. And here are some other tips may be helpful:
|
Beta Was this translation helpful? Give feedback.
-
@KeXiangWang , can I ask you a follow-up question? I would like to generate 100M records and measure how long it takes RisingWave to process them to completion. I modified the statement ok
CREATE MATERIALIZED SOURCE person (
"id" BIGINT,
"name" VARCHAR,
"email_address" VARCHAR,
"credit_card" VARCHAR,
"city" VARCHAR,
"state" VARCHAR,
"date_time" TIMESTAMP,
"extra" VARCHAR,
PRIMARY KEY (id)
) with (
connector = 'nexmark',
nexmark.table.type = 'Person',
nexmark.split.num = '8',
nexmark.min.event.gap.in.ns = '0',
nexmark.event.num = '100000000'
) ROW FORMAT JSON;
statement ok
CREATE MATERIALIZED SOURCE auction (
"id" BIGINT,
"item_name" VARCHAR,
"description" VARCHAR,
"initial_bid" BIGINT,
"reserve" BIGINT,
"date_time" TIMESTAMP,
"expires" TIMESTAMP,
"seller" BIGINT,
"category" BIGINT,
"extra" VARCHAR,
PRIMARY KEY (id)
) with (
connector = 'nexmark',
nexmark.table.type = 'Auction',
nexmark.split.num = '8',
nexmark.min.event.gap.in.ns = '0',
nexmark.event.num = '100000000'
) ROW FORMAT JSON;
statement ok
CREATE SOURCE bid (
"auction" BIGINT,
"bidder" BIGINT,
"price" BIGINT,
"channel" VARCHAR,
"url" VARCHAR,
"date_time" TIMESTAMP,
"extra" VARCHAR
) with (
connector = 'nexmark',
nexmark.table.type = 'Bid',
nexmark.split.num = '8',
nexmark.min.event.gap.in.ns = '0',
nexmark.event.num = '100000000'
) ROW FORMAT JSON; Do you know how to modify Thanks! |
Beta Was this translation helpful? Give feedback.
-
Thanks again, @KeXiangWang , @fuyufjh , @lmatz for your help! I was able to run the benchmarks. Below are the results on my laptop with 64GB memory and 20 CPU threads. I ran these using vanilla dev environment with release build of RisingWave and with grafana enabled (to track the runtime of the benchmark). I ran one nexmark query at a time, with sources configured to generate 100M records with 0ns gap (see my Caveats:
Do these numbers look reasonable? Are there additional ways to tune the query engine in the single-node setup that I should try? Raw numbers:
|
Beta Was this translation helpful? Give feedback.
-
I use the default config that RisingWave comes with, haven't changed any of the options in the toml file. In particular,
I use unmodified Nexmark queries (
I don't think so. Here is the |
Beta Was this translation helpful? Give feedback.
-
Thanks for the confirmation!
How high is the CPU usage? My experience when running q10(stateless query) is that RisingWave use all of my cores and CPU is likely to be the bottleneck. For other stateful queries, it's hard to judge because stateful executors/operators may fetch the state from S3 and thus naturally leave CPU idle. |
Beta Was this translation helpful? Give feedback.
Thanks for your question. Yes, we have an implementation for nexmark benchmark.
You can find the SQLs to create sources here. And the SQLs to create MV for nexmark queries here.
And here are some other tips may be helpful:
nexmark.min.event.gap.in.ns
. It helps control the data-generating rate. For benchmarking, just set it to '0' in all three sources. By default, the data generator in the source…