Running nexmark benchmarks. #6661

ryzhyk · 2022-11-29T19:06:08Z

ryzhyk
Nov 29, 2022

I notice that there is an implementation of the nexmark benchmark in the source tree, along with a data generator for it. Unfortunately I couldn't find a recipe for running the benchmark in the documentation. I am able to run end-to-end tests, including nexmark tests, but that's just for testing correctness. I am interested in measuring the performance of the benchmarks. Specifically, I would like to run nexmark in the dev environment on my laptop. Any pointers would be greatly appreciated.

Answered by KeXiangWang

Nov 29, 2022

Thanks for your question. Yes, we have an implementation for nexmark benchmark.

You can find the SQLs to create sources here. And the SQLs to create MV for nexmark queries here.

And here are some other tips may be helpful:

We have not supported all of the nexmark queries as some of the semantics in the queries have not been supported. So you will find some queries do not appear here. We will work to support all in the future.
When creating sources, some parameters can be tunable. As you can see in the file, we have nexmark.min.event.gap.in.ns. It helps control the data-generating rate. For benchmarking, just set it to '0' in all three sources. By default, the data generator in the source…

View full answer

yingjunwu · 2022-11-29T23:05:40Z

yingjunwu
Nov 29, 2022
Maintainer

I think @KeXiangWang may be the right person to answer this question.

0 replies

KeXiangWang · 2022-11-29T23:32:00Z

KeXiangWang
Nov 29, 2022
Collaborator

Thanks for your question. Yes, we have an implementation for nexmark benchmark.

You can find the SQLs to create sources here. And the SQLs to create MV for nexmark queries here.

And here are some other tips may be helpful:

We have not supported all of the nexmark queries as some of the semantics in the queries have not been supported. So you will find some queries do not appear here. We will work to support all in the future.
When creating sources, some parameters can be tunable. As you can see in the file, we have nexmark.min.event.gap.in.ns. It helps control the data-generating rate. For benchmarking, just set it to '0' in all three sources. By default, the data generator in the sources will generate endless streams. To control the amount of data to be generated, you can add nexmark.event.num = '${EVENT_NUMBER}' in the sources parameters. If all three sources' nexmark.event.num is set to 100, the source will generate 92 bid events, 6 person events and 2 auction events(The ratio is given by nexmark design).

5 replies

ryzhyk Nov 29, 2022
Author

Many thanks for the detailed reply, @KeXiangWang . To clarify my question, I was able to find these sources in the repo. What I was hoping for is a command line you guys use to run the benchmarks. Sorry if I am asking something obvious. This is my first contact with RisingWave, so it's taking time for me to figure things out.

ryzhyk Nov 30, 2022
Author

I figured out I can run

./risedev slt -p 4566 -d dev e2e_test/source/basic/nexmark_endless.slt

and, thanks to your explanation, I know how to tune the generator to produce a specific number of records. Only problem is, I am running the dev cluster with a debug build of RisingWave (via risedev d). Is there a way to use a release build instead for meaningful benchmark results?

KeXiangWang Nov 30, 2022
Collaborator

Sorry for answering wrong question. Happy to konw you have find the nexmark_endless.slt. Give me some time for the release version with risedev. Thanks!

KeXiangWang Nov 30, 2022
Collaborator

You can first run /risedev configure, then enable release profile(should be on the 2th page). After that, ./risedev d with run with the release mode.
Or you can run this command:

ENABLE_RELEASE_PROFILE=true ./risedev d

ryzhyk Nov 30, 2022
Author

This works, thanks heaps!

ryzhyk · 2022-11-30T01:41:30Z

ryzhyk
Nov 30, 2022
Author

@KeXiangWang , can I ask you a follow-up question?

I would like to generate 100M records and measure how long it takes RisingWave to process them to completion. I modified the create_sources.slt.part file as shown below, but I am not sure how to implement the waiting to completion part.

statement ok
CREATE MATERIALIZED SOURCE person (
    "id" BIGINT,
    "name" VARCHAR,
    "email_address" VARCHAR,
    "credit_card" VARCHAR,
    "city" VARCHAR,
    "state" VARCHAR,
    "date_time" TIMESTAMP,
    "extra" VARCHAR,
    PRIMARY KEY (id)
) with (
    connector = 'nexmark',
    nexmark.table.type = 'Person',
    nexmark.split.num = '8',
    nexmark.min.event.gap.in.ns = '0',
    nexmark.event.num = '100000000'
) ROW FORMAT JSON;

statement ok
CREATE MATERIALIZED SOURCE auction (
    "id" BIGINT,
    "item_name" VARCHAR,
    "description" VARCHAR,
    "initial_bid" BIGINT,
    "reserve" BIGINT,
    "date_time" TIMESTAMP,
    "expires" TIMESTAMP,
    "seller" BIGINT,
    "category" BIGINT,
    "extra" VARCHAR,
    PRIMARY KEY (id)
) with (
    connector = 'nexmark',
    nexmark.table.type = 'Auction',
    nexmark.split.num = '8',
    nexmark.min.event.gap.in.ns = '0',
    nexmark.event.num = '100000000'
) ROW FORMAT JSON;

statement ok
CREATE SOURCE bid (
    "auction" BIGINT,
    "bidder" BIGINT,
    "price" BIGINT,
    "channel" VARCHAR,
    "url" VARCHAR,
    "date_time" TIMESTAMP,
    "extra" VARCHAR
) with (
    connector = 'nexmark',
    nexmark.table.type = 'Bid',
    nexmark.split.num = '8',
    nexmark.min.event.gap.in.ns = '0',
    nexmark.event.num = '100000000'
) ROW FORMAT JSON;

Do you know how to modify nexmark_endless.smt to wait for the processing to complete, so that no new outputs are produced?

Thanks!

4 replies

KeXiangWang Nov 30, 2022
Collaborator

Good question! As far as I know, we do not have one mechanism for this. But I would think it's reasonable because, normally, the data streams are naturally endless. IHMO, there shouldn't be a concept of "end" for a data stream. Our nexmark.event.num here is more like a design for deterministic tests. As the data generated is deterministic and limited, the results are also fixed.

For now, a temporary solution is to observe through grafana. You can do that by uncommenting the line 43 and 44 in the risedev.yml file. The problem is that the time granularity is too large, so the start time and completion time will not be accurate.

Or maybe @fuyufjh can have any better ideas?

fuyufjh Nov 30, 2022
Maintainer

No better ideas. As mentioned by KeXiang:

there shouldn't be a concept of "end" for a data stream

nexmark connector (and datagen connector as well) is like some kind of mock to facilitate testing or benchmarking, but it will never be used in real-world cases. Except for them, I think most of the real use cases should be infinite streams, so we decided to keep the assumption of "stream is endless".

ryzhyk Nov 30, 2022
Author

Thanks for the prompt reply! This makes sense, although I am not exactly trying to wait for the end of a stream, but rather for the end of processing of a specific set of inputs, in this case in order to measure the throughput of the system. I do understand though that this is tricky in an eventually consistent system.

I will try the Grafana trick. Alternatively, can I achieve the same by monitoring CPU utilization, i.e., is it safe to assume that CPU usage will drop to something close to zero when all inputs have been processed and no new inputs arrive?

lmatz Nov 30, 2022
Maintainer

is it safe to assume that CPU usage will drop to something close to zero when all inputs have been processed and no new inputs arrive?

There could be something still going on in the storage🤔 , e.g. doing compaction of SST files.
But the impact should be small.
We can discuss further if Grafana shows you confusing results. 😃

ryzhyk · 2022-11-30T22:43:47Z

ryzhyk
Nov 30, 2022
Author

Thanks again, @KeXiangWang , @fuyufjh , @lmatz for your help!

I was able to run the benchmarks. Below are the results on my laptop with 64GB memory and 20 CPU threads. I ran these using vanilla dev environment with release build of RisingWave and with grafana enabled (to track the runtime of the benchmark). I ran one nexmark query at a time, with sources configured to generate 100M records with 0ns gap (see my create_sources.slt.part file above).

Caveats:

I interrupted q16 and q18 after 3600s.
q9 ran out of memory after 1920s.

Do these numbers look reasonable? Are there additional ways to tune the query engine in the single-node setup that I should try?

Raw numbers:

query	time(s)
q0	63
q1	64
q2	44
q3	28
q4	600
q5	1120
q7	160
q8	42
q9	1920 (oom)
q14	520
q15	720
q16	3600 (timeout)
q17	780
q18	3600 (timeout)
q20	240
q21	84
q22	70

4 replies

yingjunwu Nov 30, 2022
Maintainer

qq: what does the y-axis (time) mean? latency? or total time duration used to process the entire dataset? or something else?

ryzhyk Nov 30, 2022
Author

Total time to process the dataset (obtained by watching the grafana dashboard)

KeXiangWang Dec 1, 2022
Collaborator

The OOM may be explained by this issue. In short, our cache memory footprint management is imperfect. We are still discussing how to fix it.
For other queries' performance, @huangjw806 may have better insights.

lmatz Dec 1, 2022
Maintainer

Are there additional ways to tune the query engine in the single-node setup that I should try?

May I ask the configuration used in the benchmark above, i.e. src/config/risingwave.toml?

such as the parallelism used
or

such as the number of threads that are dedicated for compaction.

Besides, whether the materialized view is enabled or not could make a difference, typically for stateless queries such as q0.
We are working on create sink from a non-MV right now, #6285.

Do these numbers look reasonable?

I only have experience executing stateless query q10 (without writing to a file system as Nexmark originally specifies) on an AWS c5.4xlarge machine, with a throughput of 1.4M records/s and CPU usage being 1600% (16 vCores in total). But this is done a while ago and the data is pre-generated to Kafka by using https://github.com/nexmark/nexmark/tree/master, will revisit it in the near future.

Therefore, from my own experience, the performance of the stateless query seems reasonable. For example, q0 in the above has an average throughput of 100M / 63s = 1.587M records/s.

Additionally,

may I ask if the phenomenon in #6571 is also observed in your execution of benchmarks?

In particular, #6571 (comment), nexmark.min.event.gap.in.ns = '0' may make results look strange as the workload may deviate from what we expect for Nexmark typically.

Honestly speaking, we are on the way to releasing the Beta version next year and are still in the early stage of tuning RisingWave for peak numbers, i.e. performance under an intensive workload. We primarily focus on reliability/stability issues for the moment.

ryzhyk · 2022-12-01T06:30:32Z

ryzhyk
Dec 1, 2022
Author

May I ask the configuration used in the benchmark above, i.e. src/config/risingwave.toml?

I use the default config that RisingWave comes with, haven't changed any of the options in the toml file. In particular, worker_node_parallelism is set to 4.

Besides, whether the materialized view is enabled or not could make a difference, typically for stateless queries such as q0.

I use unmodified Nexmark queries (e2e_test/streaming/nexmark/views/qNN.slt.part), which are all written as materialized views.

may I ask if the phenomenon in #6571 is also observed in your execution of benchmarks?

I don't think so. Here is the Barrier send latency graph from running q4 for example:

0 replies

lmatz · 2022-12-01T06:40:04Z

lmatz
Dec 1, 2022
Maintainer

Thanks for the confirmation!

In particular, worker_node_parallelism is set to 4.

How high is the CPU usage?
Probably can be set to a higher number than 4, if the parallelism is not enough to use all of the cores, especially for stateless queries.

My experience when running q10(stateless query) is that RisingWave use all of my cores and CPU is likely to be the bottleneck.

For other stateful queries, it's hard to judge because stateful executors/operators may fetch the state from S3 and thus naturally leave CPU idle.

2 replies

ryzhyk Dec 1, 2022
Author

Let me try that! mCPU usage varies across queries. It stays around 800% for some queries, and is lower, e.g., 500% for others.

I am running the dev cluster, which I think uses an in-memory store, so I don't think S3 is a bottleneck. Please correct me if I'm wrong.

ryzhyk Dec 2, 2022
Author

It looks like increasing the number of cores does improve the performance of some of the queries. It think you are right about stateless queries benefiting more from this than more complex queries with joins and aggregates.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RisingWave Labs

Running nexmark benchmarks. #6661

{{title}}

Replies: 6 comments 15 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

RisingWave Labs

Running nexmark benchmarks. #6661

ryzhyk Nov 29, 2022

Replies: 6 comments · 15 replies

yingjunwu Nov 29, 2022 Maintainer

KeXiangWang Nov 29, 2022 Collaborator

ryzhyk Nov 29, 2022 Author

ryzhyk Nov 30, 2022 Author

KeXiangWang Nov 30, 2022 Collaborator

KeXiangWang Nov 30, 2022 Collaborator

ryzhyk Nov 30, 2022 Author

ryzhyk Nov 30, 2022 Author

KeXiangWang Nov 30, 2022 Collaborator

fuyufjh Nov 30, 2022 Maintainer

ryzhyk Nov 30, 2022 Author

lmatz Nov 30, 2022 Maintainer

ryzhyk Nov 30, 2022 Author

yingjunwu Nov 30, 2022 Maintainer

ryzhyk Nov 30, 2022 Author

KeXiangWang Dec 1, 2022 Collaborator

lmatz Dec 1, 2022 Maintainer

Additionally,

ryzhyk Dec 1, 2022 Author

lmatz Dec 1, 2022 Maintainer

ryzhyk Dec 1, 2022 Author

ryzhyk Dec 2, 2022 Author

ryzhyk
Nov 29, 2022

Replies: 6 comments 15 replies

yingjunwu
Nov 29, 2022
Maintainer

KeXiangWang
Nov 29, 2022
Collaborator

ryzhyk Nov 29, 2022
Author

ryzhyk Nov 30, 2022
Author

KeXiangWang Nov 30, 2022
Collaborator

KeXiangWang Nov 30, 2022
Collaborator

ryzhyk Nov 30, 2022
Author

ryzhyk
Nov 30, 2022
Author

KeXiangWang Nov 30, 2022
Collaborator

fuyufjh Nov 30, 2022
Maintainer

ryzhyk Nov 30, 2022
Author

lmatz Nov 30, 2022
Maintainer

ryzhyk
Nov 30, 2022
Author

yingjunwu Nov 30, 2022
Maintainer

ryzhyk Nov 30, 2022
Author

KeXiangWang Dec 1, 2022
Collaborator

lmatz Dec 1, 2022
Maintainer

ryzhyk
Dec 1, 2022
Author

lmatz
Dec 1, 2022
Maintainer

ryzhyk Dec 1, 2022
Author

ryzhyk Dec 2, 2022
Author