feat: MCOL-4889 analyze and vacuum bloat #3665
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New Functionality
This PR introduces three new User-Defined Functions (UDFs) to expose new capabilities to database users:
mcs_analyze_partition_bloat('schema', 'table', partition_number)Analyzes a single specified partition within a table and returns information about the percentage of empty records (bloat).
mcs_analyze_table_bloat('schema', 'table')Extends the analysis to all partitions within a given table, providing a comprehensive view of bloat across the entire table.
mcs_vacuum_partition('schema', 'table', partition_number)Serves as the entry point for manually triggering the partition cleanup process.
This PR implements the initial framework required for this operation, including creating hidden partitions and enabling targeted data loading.
Code Changes and Implementation Details
The implementation spans several components of the ColumnStore engine:
1. Bloat Analysis Engine
dbcon/execplan/commandpackageprocessor.cppwith new methods:
analyzePartitionBloatandanalyzeTableBloat.dbcon/execplan/CalpontSystemCatalog.cppgetQueryData,setupQueryTxnCtx2. Vacuuming Framework
The foundation for the vacuuming process is built upon two key changes:
a. Hidden Partition Management
SELECTqueries.createHiddenStripeColumnExtents→ creates hidden partitions.makePartitionVisible→ atomic swap to make compacted partition visible.storage-manager/dbrm/utils/slavecomm/utils/slavecomm/slavedbrmnode.cppEnsures functionality in both single-node and distributed environments.
b. Targeted Bulk Loading
cpimportutility enhanced to support targeted partition loading.writeengine/bulk/we_cmdargs.cpp→ new CLI argument for target partition (Directory.Segment.DBRoot).writeengine/bulk/we_bulkload.cpp(BulkLoad class) → processes new argument.SELECTdata from bloated partition and pipe it directly into the hidden partition via cpimport.3. UDF Interface
dbcon/mysql/ha_mcs_client_udfs.cppdbcon/mysql/ha_mcs_impl.cppFuture Work
This PR provides the foundational components for bloat analysis and vacuuming.
Upcoming work will address: