v5.0.0
New Feature
- Supports UDAF falling back.
- Supports native round-robin partitioner.
- Supports native range partitioner.
- Supports native WindowGroupLimitExec introduced in Spark-3.5.
- Supports SHJ falling back to SMJ when built side is too big.
- Fully supports to Apache Celeborn shuffle service.
- Initial supports to Apache Uniffle shuffle service.
- Initial supports to Apache Paimon datasource.
Improvement
Improved memory management in AggExec/SortMergeJoinExec, reducing number of OOMs.
Imptoved metric statistics.
Bug fixes
- Fixed inconsistent string to data casting.
- Fixed inconsistent bloom filter join when bloom filter is generated by Spark.
- Fixed incorrect sort ordering when writing tables with dynamic partitions.
- Fixed inconsistent sha2x functions.
- Fixed a lot of bugs those might lead to query failure, see
What's Changed.
What's Changed
- release version v4.0.1 by @richox in #690
- fix incorrect expression conversion: Days should be DayOfMonth by @richox in #691
- fix ci: cache spark binaries by @richox in #696
- Bump smallvec from 2.0.0-alpha.7 to 2.0.0-alpha.8 by @dependabot in #692
- Bump prost from 0.13.3 to 0.13.4 by @dependabot in #688
- Dev repartitioning by @gy11233 in #693
- Add Blaze icon and issue navigation In IDEA by @cxzl25 in #699
- [BLAZE-700] Minor nit fix for hyperlink by @merrily01 in #701
- [BLAZE-706] Fix year/month/day functions data type by @wForget in #703
- [BLAZE-704] Specify name for spark ext function by @wForget in #705
- Fix some incorrect module name mapping in docker compose file by @harveyyue in #709
- fix ci: trigger ci when opening PR by @richox in #711
- Support native scan hive paimon cow table by @harveyyue in #708
- Automatically use the protoc version downloaded by the maven plugin by @cxzl25 in #702
- [BLAZE-707][FOLLOWUP] NativePaimonTableScanExec should use shimed PartitionedFile and min partition number by @SteNicholas in #713
- fix ci: trigger ci when opening/changing PR by @richox in #714
- [BLAZE-287][FOLLOWUP] BlazeCelebornShuffleWriter should use mapped shuffle id for rerunning stage of fetch failure by @SteNicholas in #712
- Bump sonic-rs from 0.3.16 to 0.3.17 by @dependabot in #694
- Bump smallvec from 2.0.0-alpha.8 to 2.0.0-alpha.9 by @dependabot in #698
- Bump foldhash from 0.1.3 to 0.1.4 by @dependabot in #710
- feat(spill): Align with the multi IO compression codec in spill by @zuston in #657
- bug fixes by @richox in #717
- Fix OrcScan reads missing data column by @ASiegeLion in #716
- fix test failures by @richox in #720
- [BLAZE-725] Bump Spark from 3.5.3 to 3.5.4 by @SteNicholas in #726
- Fix MacOS compile by @cxzl25 in #724
- Apply spotless by @cxzl25 in #728
- Remove bug_report unnecessary information by @cxzl25 in #727
- fix mvn build helper by @richox in #735
- Duplicated project schema will cause index out of bounds exception in orc_exec by @harveyyue in #723
- [BLAZE-729] Fix a typo in the Shebang line of the shell script by @merrily01 in #730
- Fix orc map type entries field naming issue by @harveyyue in #732
- Bump itertools from 0.13.0 to 0.14.0 by @dependabot in #733
- close inactive issues by @richox in #738
- [BLAZE-736] Write time should increment for mapperEnd in CelebornPart… by @HYBG-1126 in #739
- fix ci: use huaweicloud mirror to download spark binaries by @richox in #742
- Bump tempfile from 3.14.0 to 3.15.0 by @dependabot in #741
- Bump async-trait from 0.1.83 to 0.1.84 by @dependabot in #740
- fix performance issues by @richox in #743
- [BLAZE-744] Bump Celeborn version from 0.5.2 to 0.5.3 by @SteNicholas in #745
- Dev repartitioning by @gy11233 in #734
- Bump Paimon version from 0.9.0 to 1.0.0 by @harveyyue in #751
- [BLAZE-747] Enhance the ArrowFFIExporter.exportNextBatch method to execute conditionally by @merrily01 in #748
- fix range repartitioning proto issue by @gy11233 in #752
- Bump async-trait from 0.1.84 to 0.1.85 by @dependabot in #746
- Bump uuid from 1.11.0 to 1.11.1 by @dependabot in #754
- Bump tokio from 1.42.0 to 1.43.0 by @dependabot in #755
- fix ci: update to actions/upload-artifact@v4 by @richox in #756
- fix ci: update to actions/upload-artifact@v4 by @richox in #757
- fix ci: add --all-opens for supporting jdk17 by @richox in #758
- fix ci: use cached spark-bin directory to walk around
permission deniedissue by @richox in #766 - fix-ci: use specified jdk version by @richox in #767
- fix-ci: adjust memory configuration by @richox in #768
- Bump uuid from 1.11.1 to 1.12.0 by @dependabot in #765
- Bump log from 0.4.22 to 0.4.25 by @dependabot in #764
- [BLAZE-747][FOLLOW-UP] Fix user changed in FFI NextBatch by @Flyangz in #769
- Bump smallvec from 2.0.0-alpha.9 to 2.0.0-alpha.10 by @dependabot in #770
- [BLAZE-762] Return null when log function input is negative by @wForget in #763
- [BLAZE-760] Fallback shuffle exchange when range partitioning with unsupported type by @wForget in #761
- supports falling back hash join to sort merge join when hash table is too big by @richox in #753
- fix-ci: remote incorrect cache by @richox in #779
- fix-ci: rust fmt by @richox in #780
- Add comma to line in README file by @xleoken in #778
- bug fixes by @richox in #777
- [BLAZE-775] Support float type for sum function by @wForget in #776
- [BLAZE-773] Support long type for floor function by @wForget in #774
- fix-ci: pull_request_target -> pull_request by @richox in #782
- fix build error and code style by @wForget in #781
- Bump uuid from 1.12.0 to 1.12.1 by @dependabot in #783
- [BLAZE-786] Mark big decimal value convertion as unsupported by @wForget in #787
- use better aggregate OwnedKey construction by @richox in #784
- [BLAZE-790] Support LZ4_RAW compression codec for parquet by @SteNicholas in #791
- Add support of mac aarch64 for tpcds data generator by @zuston in #792
- Automatic cancel previous CI tests when newly commit comes for per PR by @zuston in #794
- Add support of pprof dump for rust execution by @zuston in #793
- Bump tempfile from 3.15.0 to 3.16.0 by @dependabot in #802
- Bump serde from 1.0.216 to 1.0.217 by @dependabot in #800
- Bump poem from 1.3.59 to 3.1.6 by @dependabot in #799
- Bump rand from 0.8.5 to 0.9.0 by @dependabot in #801
- Bump bytes from 1.9.0 to 1.10.0 by @dependabot in #811
- Bump async-trait from 0.1.85 to 0.1.86 by @dependabot in #810
- Add support of Apache Uniffle for remote shuffle service by @zuston in #796
- use separated thread in ffi exporter by @richox in #788
- Fix the rootless-docker action failure when building the jar in github action by @zuston in #813
- Bump uuid from 1.12.1 to 1.13.1 by @dependabot in #814
- Add support of building native with --features by @zuston in #797
- Add support of memory profile by @zuston in #798
- [BLAZE-808] Support statistics of ExecutionPlan for WindowExec by @SteNicholas in #809
- [BLAZE-805] Support statistics of ExecutionPlan for SortExec by @SteNicholas in #807
- [BLAZE-803] Support statistics of ExecutionPlan for LimitExec by @SteNicholas in #804
- Bump once_cell from 1.20.2 to 1.20.3 by @dependabot in #816
- fix some issues causing 137 oom by @richox in #815
- minor fixes of OOM cases by @richox in #817
- clean rss shuffle writer api by @richox in #820
- Bump bytesize from 1.3.0 to 1.3.2 by @dependabot in #819
- support orc scan bytes metric by @Flyangz in #821
- Move scala object HiveClientHelper from java to scala folder by @harveyyue in #822
- Bump prost from 0.13.4 to 0.13.5 by @dependabot in #823
- fix hanging in corner case: ArrowFFIExporter implements AutoCloseable by @richox in #831
- improve MetricNode: fix metric missing in Union children by @richox in #832
- Support cast decimal data type with different precision and sale by @harveyyue in #839
- Support long type for ceil function by @harveyyue in #825
- Bump uuid from 1.13.1 to 1.14.0 by @dependabot in #840
- Bump serde from 1.0.217 to 1.0.218 by @dependabot in #842
- Fallback cast date type to SparkUDFWrapper function by @harveyyue in #838
- fix http server startup by @Flyangz in #834
- add malloc_conf for memory profiling by @Flyangz in #836
- Supports UDAF and other aggregate functions not implemented by @gy11233 in #848
- fix possible panic in spawn_worker_thread_on_stream by @richox in #849
- Bump tempfile from 3.16.0 to 3.17.1 by @dependabot in #828
- Bump zstd from 0.13.2 to 0.13.3 by @dependabot in #841
- Bump log from 0.4.25 to 0.4.26 by @dependabot in #846
- Bump bytesize from 1.3.2 to 2.0.0 by @dependabot in #847
- Fix udaf, add udaf enable conf by @gy11233 in #851
- Bump poem from 3.1.6 to 3.1.7 by @dependabot in #845
- Bump uuid from 1.14.0 to 1.15.1 by @dependabot in #853
- remove eager shuffle reading by @richox in #858
- fix possible hanging in ffi reader by @richox in #860
- Cast function should convert scientific notation to correct decimal value by @harveyyue in #844
- Bump bytesize from 2.0.0 to 2.0.1 by @dependabot in #859
- Update rust toolchain to latest nightly by @wForget in #861
- .gitignore file add target-docker folder ignore by @wsk1314zwr in #862
- Bump jemalloc_pprof from 0.6.0 to 0.7.0 by @dependabot in #863
- Bump bytes from 1.10.0 to 1.10.1 by @dependabot in #864
- Bump tempfile from 3.17.1 to 3.18.0 by @dependabot in #866
- Bump serde from 1.0.218 to 1.0.219 by @dependabot in #867
- Bump async-trait from 0.1.86 to 0.1.87 by @dependabot in #865
- Bump sonic-rs from 0.3.17 to 0.4.0 by @dependabot in #875
- Bump once_cell from 1.20.3 to 1.21.0 by @dependabot in #874
- [BLAZE-877] Bump Celeborn version from 0.5.3 to 0.5.4 by @SteNicholas in #878
- [BLAZE-879] Bump Spark from 3.5.4 to 3.5.5 by @SteNicholas in #881
- Support shuffle read records and total time metrics by @Flyangz in #873
- support orc reading based on index by @Flyangz in #871
- Support more native parquet scan metrics by @harveyyue in #876
- Bump once_cell from 1.21.0 to 1.21.1 by @dependabot in #882
- Bump async-trait from 0.1.87 to 0.1.88 by @dependabot in #887
- Bump foldhash from 0.1.4 to 0.1.5 by @dependabot in #886
- Bump tempfile from 3.18.0 to 3.19.0 by @dependabot in #884
- complete UDAF fallback implementation by @richox in #888
- Bump tokio from 1.43.0 to 1.44.1 by @dependabot in #883
- fix celeborn shuffle writer memory leaking by @richox in #889
- [BLAZE-891] Remove stop interface of RssPartitionWriterBase by @SteNicholas in #892
- Bump tempfile from 3.19.0 to 3.19.1 by @dependabot in #894
- Bump serde from 1.0.217 to 1.0.219 by @dependabot in #893
- [BLAZE-895] Bump Paimon from 1.0.0 to 1.0.1 by @SteNicholas in #896
- fix rss bug: forced spilling an unspillable memory consumer by @richox in #898
- (celeborn shuffle read) force disable decompression because compressi… by @richox in #897
- [BLAZE-905] Bytes written should increment in UnifflePartitionWriter#write by @SteNicholas in #906
- fix agg failure: index out of bounds by @richox in #899
- Bump smallvec from 2.0.0-alpha.10 to 2.0.0-alpha.11 by @dependabot in #900
- fix NPE while getting spill buf metrics by @richox in #904
- refactor rss shuffle writer, fix incorrect map status by @richox in #901
- [BLAZE-902] Fix UnifflePartitionWriter invoke ShuffleWriteMetricsReporter#incWriteTime with nano seconds by @SteNicholas in #903
- introduce spark version control with spark-version-annotation-macros,… by @richox in #908
- Fix result is empty when bloom filter is built by spark side for some situation by @xm0830 in #911
- keep same algorithm between put_long/put_binary and might_contain_long/might_contain_binary by @xm0830 in #913
- fix inconsistent string to date casting by @richox in #912
- fix bloom_filter_might_contain + literal params by @richox in #914
- get_json_object support blank space after '.' in path by @xm0830 in #915
- ProjectExec adds cast automatically when data types not matched by @richox in #916
- Bump log from 0.4.26 to 0.4.27 by @dependabot in #909
- fix spark_xxhash64 + literal error by @richox in #920
- Bump poem from 3.1.7 to 3.1.8 by @dependabot in #918
- Bump tonic-build from 0.12.3 to 0.13.0 by @dependabot in #917
- fix UDAF fallbacking with literal params by @richox in #922
- fix error when bloom filter is null by @richox in #925
- fix error when copying BlazeColumnarArray by @richox in #926
- Bump once_cell from 1.21.1 to 1.21.2 by @dependabot in #924
- rewrite UnionExec and support auto type casting by @richox in #927
- fix error when subquery is not finished by @richox in #928
- NativeConverters adds aggregate function return type by @richox in #930
- Bump sonic-rs from 0.4.0 to 0.4.1 by @dependabot in #932
- Bump bigdecimal from 0.4.7 to 0.4.8 by @dependabot in #931
- Bump once_cell from 1.21.2 to 1.21.3 by @dependabot in #929
- set arrows default struct conflict policy to APPEND by @richox in #933
- fix missing ReturnType in convertMoreAggregateExpr by @richox in #934
- fix native shuffle reader with HeapByteBuffer by @richox in #935
- convert scalar value using arrow ipc by @richox in #938
- feat: Activate symbolize feature for heap profile by @zuston in #937
- refactor aggregate
unfreeze_from_rows()and fix UDAF fallbacking error by @richox in #940 - [BLAZE-941] BlazeCelebornShuffleReader should add batch open stream time to fetch wait time by @SteNicholas in #942
- Bump tokio from 1.44.1 to 1.44.2 by @dependabot in #939
- Bump sonic-rs from 0.4.1 to 0.5.0 by @dependabot in #943
- add conf: spark.blaze.enable.scan.parquet/orc by @richox in #944
- Avoid warning log
No such type of ValidateSparkPlanby @cxzl25 in #948 - Scan parquet/orc config by @cxzl25 in #949
- Bump poem from 3.1.8 to 3.1.9 by @dependabot in #946
- code refactoring and bug fixes by @richox in #952
- assert_eq key_rows sorted_row_indices by @cxzl25 in #954
- normalize shuffle write time to output io time by @richox in #953
- supports WindowGroupLimitExec by @richox in #957
- fix SortExec error when sort key exprs are empty by @richox in #958
- fix union error with empty inputs by @lihao712 in #959
- Bump rand from 0.9.0 to 0.9.1 by @dependabot in #956
- Apply scalafix removeUnusedImports by @cxzl25 in #960
- style check and reformat by @cxzl25 in #961
- fix imprecise ScalarValue memory size by @richox in #962
- add childOrderingRequired tag to DataWritingCommandExec by @richox in #963
- fix incorrect WindowGroupLimit conversion by @lihao712 in #964
- Expect sha2 function result will be consistent with spark by @harveyyue in #966
- add expr string to SparkUDFWrapper by @lihao712 in #967
- fix get_indexed_field nullable error by @lihao712 in #968
- optimize sort merge join and avoid oom by @lihao712 in #970
- get_array_mem_size() prefers capacity to len by @lihao712 in #969
- remove vcs.xml thirdparty part by @cxzl25 in #972
- release version v5.0.0 by @richox in #973
New Contributors
- @gy11233 made their first contribution in #693
- @cxzl25 made their first contribution in #699
- @merrily01 made their first contribution in #701
- @ASiegeLion made their first contribution in #716
- @HYBG-1126 made their first contribution in #739
- @Flyangz made their first contribution in #769
- @xleoken made their first contribution in #778
- @wsk1314zwr made their first contribution in #862
- @xm0830 made their first contribution in #911
Full Changelog: v4.0.1...v5.0.0