-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Fix run_tpcds data dir #19771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix run_tpcds data dir #19771
Conversation
comphead
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gabotechs I think it shouldn't be there. by default the script checks for datafusion-benchmarks repo here https://github.com/apache/datafusion-benchmarks/tree/main/tpcds/data/sf1 and there is no tpcds-sf1.
you can specify your own DATA_DIR like
export DATA_DIR=../../datafusion-benchmarks/tpcds/data/sf1/
and then run tpcds benchmarks
|
🤔 Are you sure? I get the impression that this is why the benchmark run commands are failing Also, note how the https://github.com/apache/datafusion/blob/main/benchmarks/bench.sh#L633 |
|
I just checked for #19635 the TPCDS benchmark the commands provided in |
|
Ok, I see what happened here:
IMO it would be nicer if In fact, I'd bet the intention behind this code here https://github.com/apache/datafusion/blob/main/benchmarks/bench.sh#L644-L646 is that it works that way, as it's explicitly extracting the contents to However happy to follow your lead here, I can survive setting up an extra env variable. |
Which issue does this PR close?
Rationale for this change
Running
./bench.sh run tpcdswith a freshly created./bench.sh data tpcdsfails with the following error:This PR fixes it
What changes are included in this PR?
Fixes the
TPCDS_DIRvariable inrun_tpcdsAre these changes tested?
just benchmark scripts
Are there any user-facing changes?
no need