Skip to content

paimon-cpp-v0.2.0

Latest

Choose a tag to compare

@zjw1111 zjw1111 released this 21 May 09:39
· 6 commits to release-0.2 since this release
4e582cb

Paimon C++ v0.2.0

Paimon C++ v0.2.0 expands the native C++ engine with major improvements across compaction, PK table write buffer spillable, global indexes, and build reliability.

Highlights

  • Compaction support: Added append-table and primary-key-table compaction capabilities, including deletion-vector support, lookup-based compact rewriting, and full-compaction controls.
  • PK table spillable write path: Introduced external sort buffering and writer memory management for primary-key writes under constrained memory.
  • Global index enhancements: B-tree global index support, range bitmap index support, and Lumina dependency updates.
  • Build and dependency improvements (in progress): Improved dependency source resolution, added stronger CMake package validation.

📌 NOTE: Downloading Source Code

When downloading the source code, do NOT use the auto-generated Source code (tar.gz) / Source code (zip) links provided by GitHub.
These archives are missing Git LFS files (e.g., third-party binary libraries) and will cause build failures.

✅ Please download paimon-cpp-v0.2.0.tar.gz instead, which includes all necessary LFS files.

What's Changed

  • chore: add maintainership and contributions to README by @lxy-9602 in #145
  • feat: optimize orphan files cleaner by @chongchongxiao in #135
  • fix: fix some disabled ut in FileStoreCommitImplTest by @zjw1111 in #148
  • chore: fix build and packaging process by @Eyizoha in #133
  • feat(core): introduce ColumnarRowRef with shared batch context by @xylaaaaa in #120
  • fix(core): avoid error when manifest entry has value_stats_cols by @SGZW in #150
  • chore: add VERSION for lumina by @lszskye in #153
  • feat: add configs for compaction and default target file size for different table by @lucasfang in #151
  • feat: support LeafFunction of StartsWith, EndsWith, Contains, Like by @SteNicholas in #130
  • feat(compaction): support universal & force level0 compaction strategy for pk table by @lxy-9602 in #152
  • chore: specify zlib include/library in boost by @lszskye in #155
  • feat(core): pk table scan support data manifest value_stats_cols filter by @SGZW in #157
  • refactor: refactor ColumnarBatchContext to reduce ptr overhead by @lxy-9602 in #154
  • feat: support write for deletion vector by @lucasfang in #158
  • feat(compaction): add MergeTreeCompactRewriter for compacting files in MOR by @lxy-9602 in #161
  • feat: support aarch64 architecture for JindoSDK dependency by @SteNicholas in #163
  • fix: mem leak in write when I/O exception occurs by @lxy-9602 in #165
  • fix(ut): fix binary row init under gcc8 by @SGZW in #168
  • feat: support fixed length chunked dictionary for rangebitmap by @fafacao86 in #167
  • refactor(core): apply ColumnarRowRef in KeyValueInMemoryRecordReader by @xylaaaaa in #144
  • feat(compaction): support multiple PersistProcessor in PK compaction by @lxy-9602 in #170
  • feat(catalog): enrich catalog interface with more methods by @ChaomingZhangCN in #102
  • feat(metrics): add histogram impl && add table scan metric by @SGZW in #171
  • feat(compression): add MemorySlice comparator and support LookupStoreFactory for SST file by @lxy-9602 in #172
  • feat(compaction): support compaction for append table by @lucasfang in #169
  • fix: Add CMAKE_POLICY_VERSION_MINIMUM for CMake 3.30+ compatibility by @mrdrivingduck in #175
  • fix(ut): fix histogram flaky ut by @SGZW in #176
  • fix(cmake): fix factory registry in example by @lucasfang in #178
  • feat: support bitslice for rangebitmap by @fafacao86 in #174
  • fix(cmake): fix LOWERCASE_BUILD_TYPE definition and usage by @mrdrivingduck in #180
  • feat(compaction): support multi-level lookup in LSM tree by @lszskye in #179
  • refactor(predicate): move predicate_utils.h to public include by @mrdrivingduck in #181
  • docs(readme): add license and deepwiki badges by @zjw1111 in #182
  • feat(compaction): support append table compaction with dv by @lucasfang in #177
  • chore: update PR template to add description of generative AI tools by @zjw1111 in #183
  • fix(cmake): patch Arrow for CMAKE_POLICY_VERSION_MINIMUM by @mrdrivingduck in #184
  • chore: adjust comments for consistency by @lxy-9602 in #187
  • refactor: refactor sst and add io exception test by @lxy-9602 in #188
  • chore: disable lumina and lucene by default by @zjw1111 in #190
  • fix: LookupLevels support key fields at any position in schema & little refactor by @lxy-9602 in #192
  • feat: support LookupMergeTreeCompactRewriter by @lszskye in #186
  • fix(ut): fix more binary row init under gcc8 by @SGZW in #193
  • feat: support rangebitmap read and write by @fafacao86 in #185
  • feat(compaction): support compaction for key table in framework by @lucasfang in #195
  • refactor: unify BinarySection classes to single MemorySegment model and use string_view to avoid copies by @lxy-9602 in #196
  • fix: fix compaction crash when PK fields are not at the beginning of table schema by @lxy-9602 in #202
  • feat(build): add gcc8 ci to avoid some test failure by @SGZW in #194
  • feat: implement RangeBitmapGlobalIndex for global range-bitmap index support by @lxy-9602 in #199
  • fix(compaction): make sure that only one task is running at a time, refactor compaction manager creation and add test by @lucasfang in #201
  • test: add pk compaction inte test by @lxy-9602 in #203
  • test(compaction): add inte test for pk table compaction by @lszskye in #204
  • fix: fix std::string_view cast for clang-tidy-check by @lucasfang in #205
  • fix: compaction & lookup performance optimization and SST fixes by @lxy-9602 in #207
  • refactor: extract WriteBuffer from MergeTreeWriter by @zjw1111 in #206
  • refactor: move arrow stream adapters into common utils by @zjw1111 in #209
  • feat: integrate ccache to accelerate compilation in local and CI environments by @zjw1111 in #211
  • feat: add RE2 as a third-party dependency for Arrow build by @zjw1111 in #213
  • feat(compaction): support lru cache by @lszskye in #210
  • feat: update lumina lib to v0.2.1 by @lxy-9602 in #208
  • feat(compact): support remote lookup file & add DropFileCallback in Levels by @lxy-9602 in #214
  • fix: Fix date type not supported in LiteralConverter::ConvertLiteralsFromString by @lxy-9602 in #217
  • refactor(lookup): Decouple RemoteLookupFileManager from LookupLevels and refactor Levels callback lifecycle by @lxy-9602 in #216
  • fix(compaction): fix TestHash function in BloomFilter by @lszskye in #221
  • chore: add code style by @lxy-9602 in #222
  • fix: reject nullable map keys in schema parsing instead of silently overriding by @lxy-9602 in #226
  • feat: support load table by table location directly by @Smith-Cruise in #223
  • feat(compact): Support global LookupFileCache for compact lookup mode by @lxy-9602 in #220
  • fix: unstable ut for LookupLevelsTest by @lxy-9602 in #229
  • chore(compaction): add docs for append/pk compaction by @lszskye in #230
  • feat(core): Add bucket function implementation by @ChaomingZhangCN in #218
  • feat(compact): add FieldListaggAgg aggregate function by @Zouxxyy in #224
  • feat(agg): support paimon Java config aggregation.remove-record-on-delete by @duanyyyyyyy in #225
  • feat(btree): Intro b-tree global index and add tests for java compatibility. by @ChaomingZhangCN in #212
  • test: add unittest by @lxy-9602 in #231
  • test(compaction): add ut by @lszskye in #235
  • fix: Align PartialUpdateMergeFunction with Java: add initRow/meetInsert logic by @lxy-9602 in #233
  • fix: fix compile error on macOS with clang 21 by @wgtmac in #237
  • chore(log): add logging for scanner and compact manager by @ChaomingZhangCN in #236
  • feat(spill): add SpillWriter, SpillReader and SpillChannelManager for… by @dalingmeng in #219
  • feat(comapct): Implement AppendCompactCoordinator for append-only unaware-bucket table compaction by @lxy-9602 in #238
  • feat: Add merge method for DeletionVector by @ChaomingZhangCN in #241
  • feat(scan): add BucketSelectConverter for predicate-based bucket pruning by @liangjie3138 in #234
  • refactor(btree): refactor btree global index and add tests by @lxy-9602 in #243
  • feat(catalog): add ListSnapshots() to public Catalog API by @mrdrivingduck in #244
  • fix(global_index): compatibility for legacy lumina index type by @lszskye in #247
  • fix: Fix missing StatusCode switch cases and correct Literal hash qualifier by @Smith-Cruise in #248
  • feat: support file type classify by @lucasfang in #254
  • feat: support BTreeFileMetaSelector & LazyFilteredBTreeReader for btree index by @lszskye in #250
  • feat: Use IOManager channel for lookup file creation and make Lucene tmp dir configurable by @lxy-9602 in #251
  • feat: support RowRangeIndex for DataEvolutionBatchScan by @lxy-9602 in #255
  • refactor(btree): eliminate hot-path overhead from allocs and deserialization by @lxy-9602 in #252
  • refactor(mergetree): introduce InMemorySortBuffer and MergedKeyValueRecordReader by @zjw1111 in #253
  • feat(scan): support timestamp-based snapshot lookup by @mrdrivingduck in #246
  • fix: Use fully qualified namespace for Literal in std::hash specialization by @Smith-Cruise in #256
  • feat(spill): introduce external sort buffer and writer memory manager by @zjw1111 in #257
  • feat(btree): refactor GlobalIndexScan & add inte test for btree global index by @lszskye in #262
  • fix: support WithIgnoreNumBucketCheck in write context by @lucasfang in #264
  • feat: optimize BTree index read with io buffering and boundary skip by @lxy-9602 in #263
  • feat: add system table framework with options table by @suxiaogang223 in #261
  • test(btree): add unit test for btree by @lxy-9602 in #266
  • feat(spill): support to create ExternalSortBuffer in WriteBuffer by @zjw1111 in #265
  • feat(core): support null merge results in key-value readers by @zjw1111 in #269
  • fix: fix uuid.h to work on macOS by @wgtmac in #271
  • feat: add flexible dependency source resolution by @suxiaogang223 in #259
  • feat(spill): use memory pool for arrow operations by @zjw1111 in #270
  • feat(parquet): add metrics for parquet reader observability by @duanyyyyyyy in #258
  • fix: field name mismatch when alter table rename caused by FieldMappingReader by @duanyyyyyyy in #274
  • test: add ut for prefetch file batch reader by @lucasfang in #277
  • test: add ut for release/0.2 by @lszskye in #275
  • refactor(global_index): adapt global index id in pre-filter by @lszskye in #278
  • test: add unit test for release-0.2 by @lxy-9602 in #273
  • feat: remove global index external path while drop table by @lszskye in #276
  • refactor(btree): optimize BTreeFileMetaSelector with zero-copy comparator by @lxy-9602 in #280
  • chore: add rst description for global index by @lszskye in #281
  • fix: Restore direct Arrow thread pool control inside Parquet format library by @lxy-9602 in #284
  • test(spill): add unit tests and integration tests for spill-to-disk by @dalingmeng in #272
  • feat: add audit_log and binlog system tables by @suxiaogang223 in #268
  • feat: update lumina lib to v0.2.2 by @lxy-9602 in #287
  • feat(blob): Support null blob and multiple blob fields write and commit by @lxy-9602 in #286
  • chore: adjust code style and formatting by @lxy-9602 in #289
  • enhance: improve dependency source resolution by @suxiaogang223 in #282
  • chore: fix global index rst by @lszskye in #294
  • feat: support BlobDescriptor with version 2 by @lszskye in #290
  • refactor: let CastNonPartitionArrayIfNeed only handle casting, leave name mapping to MappingFields in FieldMapping by @lxy-9602 in #293
  • feat(spill): optimize external sort with SpillFileMerger to reduce write amplification by @zjw1111 in #288
  • feat(blob): support blob-desc/blob-view config parsing, schema validation and BlobFileContext by @lxy-9602 in #291
  • chore: Update README.md and version by @lxy-9602 in #295
  • chore: update README and api docs by @zjw1111 in #296

New Contributors

Full Changelog: v0.1.0...v0.2.0