Read all Parquet files in a folder #17757
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Do not review. This is not to be merged into cudf.
This is actually an application, implemented in place of
PARQUET_TEST
. Given a path to a folder containing parquet files, the application will read each file using parquet chunked reader. We use this to check memory issue while reading parquet file using compute sanitizer:The data folder can be given either through the environment variable
PARQUET_PATH
or hard code into the source file.All the implementation is put in
cpp/tests/io/spark_test.cpp
: https://github.com/rapidsai/cudf/pull/17757/files#diff-712fdde5014f59e26a43b244beb3c000ad1ca5831faeeee8b184d3f2971e5e46