-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Description
Describe the enhancement requested
pyarrow.dataset.write_dataset(compression='lz4_raw') currently fails with:
Traceback (most recent call last):
File "/work/projects/lisa/testpyarrow.py", line 3, in <module>
_reencode_parquet('sched_switch.lz4.parquet', 'updated.parquet', compression='lz4_raw')#, row_group_size=128*1024*1024, compression='LZ4')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "x.py", line 1, in my_write_parquet
options = pyarrow.dataset.ParquetFileFormat().make_write_options(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/_dataset_parquet.pyx", line 206, in pyarrow._dataset_parquet.ParquetFileFormat.make_write_options
File "pyarrow/_dataset_parquet.pyx", line 594, in pyarrow._dataset_parquet.ParquetFileWriteOptions.update
File "pyarrow/_dataset_parquet.pyx", line 599, in pyarrow._dataset_parquet.ParquetFileWriteOptions._set_properties
File "pyarrow/_parquet.pyx", line 1855, in pyarrow._parquet._create_writer_properties
File "pyarrow/_parquet.pyx", line 1369, in pyarrow._parquet.check_compression_name
pyarrow.lib.ArrowException: Unsupported compression: lz4_raw
And indeed, no mention of lz4_raw is to be found in python/pyarrow/_parquet.pyx.
Would it be possible to add support for LZ4_RAW codec when writing parquet files, particularly using the dataset API ?
Component(s)
Python
Reactions are currently unavailable