Skip to content

Conversation

@amrdb
Copy link
Contributor

@amrdb amrdb commented Jul 22, 2025

New Functionality

This PR introduces three new User-Defined Functions (UDFs) to expose new capabilities to database users:

  • mcs_analyze_partition_bloat('schema', 'table', partition_number)
    Analyzes a single specified partition within a table and returns information about the percentage of empty records (bloat).

  • mcs_analyze_table_bloat('schema', 'table')
    Extends the analysis to all partitions within a given table, providing a comprehensive view of bloat across the entire table.

  • mcs_vacuum_partition('schema', 'table', partition_number)
    Serves as the entry point for manually triggering the partition cleanup process.
    This PR implements the initial framework required for this operation, including creating hidden partitions and enabling targeted data loading.


Code Changes and Implementation Details

The implementation spans several components of the ColumnStore engine:

1. Bloat Analysis Engine

  • Core logic for bloat analysis is implemented in:
    dbcon/execplan/commandpackageprocessor.cpp
    with new methods: analyzePartitionBloat and analyzeTableBloat.
  • The analysis works by executing SQL queries against the system catalog to count empty values and determine the bloat factor.
  • Enhancements in:
    dbcon/execplan/CalpontSystemCatalog.cpp
    • New functions: getQueryData, setupQueryTxnCtx
    • Provide querying any data within a from the system catalog which is used for analyze and is useful for future work.

2. Vacuuming Framework

The foundation for the vacuuming process is built upon two key changes:

a. Hidden Partition Management

  • DBRM and SlaveComm updated to support "hidden" partitions.
    • Enables background data copy without affecting concurrent SELECT queries.
  • New functions:
    • createHiddenStripeColumnExtents → creates hidden partitions.
    • makePartitionVisible → atomic swap to make compacted partition visible.
  • Changes implemented across:
    • storage-manager/dbrm/
    • utils/slavecomm/
    • utils/slavecomm/slavedbrmnode.cpp
      Ensures functionality in both single-node and distributed environments.

b. Targeted Bulk Loading

  • cpimport utility enhanced to support targeted partition loading.
  • Changes in:
    • writeengine/bulk/we_cmdargs.cpp → new CLI argument for target partition (Directory.Segment.DBRoot).
    • writeengine/bulk/we_bulkload.cpp (BulkLoad class) → processes new argument.
  • Enables vacuum process to SELECT data from bloated partition and pipe it directly into the hidden partition via cpimport.

3. UDF Interface

  • New UDFs registered and implemented in:
    • dbcon/mysql/ha_mcs_client_udfs.cpp
    • dbcon/mysql/ha_mcs_impl.cpp
  • These act as the bridge between MariaDB server and ColumnStore engine, handling:
    • Input validation
    • Invoking backend implementation

Future Work

This PR provides the foundational components for bloat analysis and vacuuming.
Upcoming work will address:

  • Finishing vacuum partition.
  • Implementing vacuum table.
  • Use AUX columns in analyze UDFs when they are added as actual queryable columns.

@amrdb amrdb force-pushed the feat/MCOL-4889-analyze-bloat branch from 50d613a to d38ad14 Compare August 7, 2025 11:36
@amrdb amrdb changed the title feat: MCOL-4889 analyze bloat feat: MCOL-4889 analyze and vacuum bloat Sep 27, 2025
@drrtuy drrtuy force-pushed the feat/MCOL-4889-analyze-bloat branch 2 times, most recently from 3d2e61a to c633123 Compare October 14, 2025 15:35
@drrtuy drrtuy self-requested a review October 14, 2025 15:37
@drrtuy drrtuy marked this pull request as ready for review October 14, 2025 15:39
Copy link
Collaborator

@drrtuy drrtuy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

untestedACK

@drrtuy drrtuy force-pushed the feat/MCOL-4889-analyze-bloat branch from c633123 to 54d5ac8 Compare October 17, 2025 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants