Skip to content

refactor: simplify auto compaction planning#126

Merged
Li0k merged 7 commits intomainfrom
li0k/improve_full_compaction
Mar 30, 2026
Merged

refactor: simplify auto compaction planning#126
Li0k merged 7 commits intomainfrom
li0k/improve_full_compaction

Conversation

@Li0k
Copy link
Copy Markdown
Collaborator

@Li0k Li0k commented Jan 30, 2026

Summary

This PR simplifies Auto compaction planning so that it stays safe under repeated invocation without relying on Full as a fallback.

Auto now only plans localized rewrites from two strategy families:

  • FilesWithDeletes
  • SmallFiles

It scans the current snapshot once, evaluates those candidates in a fixed order, and applies a planner-level plan budget before returning executable plans.

This PR also intentionally updates the pre-release Auto API/config surface to match the new planner contract.

Design Changes

1. Remove Auto fallback to Full

Auto no longer falls back to Full when specialized strategies do not match.

This avoids repeatedly rewriting healthy parquet files that are already near target_file_size under high-frequency calls.

Full remains available as an explicit/manual full-table rewrite strategy.

2. Simplify Auto planning

Auto now follows a fixed planning flow:

  1. Scan the current snapshot once
  2. Build a FilesWithDeletes candidate report
  3. If it produces non-empty plans, select it
  4. Otherwise build a SmallFiles candidate report
  5. If it produces non-empty plans, select it
  6. Apply max_auto_plans_per_run to the selected plan set

This replaces the previous candidate-chain / anti-starvation / full-like branching logic.

3. Keep planner-level observability

AutoPlanReport is introduced as the planner output model and reflects the final selected or capped plan set:

  • selected_strategy
  • plans
  • planned_input_bytes
  • planned_input_files
  • rewrite_ratio
  • reason

The reason now distinguishes:

  • Recommended
  • BudgetCapped
  • NoSnapshot
  • NoCandidate
  • NoPlansProduced

4. Preserve caller-owned group gating

For Auto-selected FilesWithDeletes, caller-provided group_filters are propagated unchanged.

Auto is responsible for choosing between localized strategies, but it does not override caller policy such as min_group_size_bytes.

5. Clarify zero-threshold behavior

Under Auto, min_delete_file_count_threshold == 0 disables delete-heavy detection and therefore disables the FilesWithDeletes candidate.

Likewise, a zero-valued auto threshold disables the corresponding candidate instead of forcing it to match every snapshot.

6. Update the pre-release Auto config/API surface

This PR intentionally changes the not-yet-stable Auto config model:

  • remove enable_full_fallback
  • remove min_impact_ratio
  • rename min_files_with_deletes_count to min_delete_heavy_files_count
  • add max_auto_plans_per_run as a planner-level budget
  • add plan_compaction_report_with_branch() for callers that need structured planning reasons

7. Add a design document

Added:

  • docs/compaction-strategy-contract.md

The document describes the current strategy model, planner budget semantics, return semantics, and the responsibility split between the library and external systems.

Reviewer Focus

Please focus on:

  • whether the simplified Auto model is clear enough
  • whether max_auto_plans_per_run is the right planner-level budget boundary
  • whether keeping caller-owned group gating is the right default behavior for Auto
  • whether the pre-release Auto API/config changes feel coherent with the new planner contract

Validation

  • cargo test -p iceberg-compaction-core --lib
  • cargo clippy -p iceberg-compaction-core --lib -- -D warnings

@Li0k Li0k changed the title perf: improve full compaction to reduce write-amp perf: improve Auto compaction fallback to Full to reduce write-amp Feb 2, 2026
@chenzl25
Copy link
Copy Markdown
Collaborator

chenzl25 commented Feb 11, 2026

I think we should consider table total size to determine how long we should compact/eliminate the equality files, since the ultimate goal of compaction it to increase the query performance. There is a trade-off between query performance and compaction cost/resources. So we can categorize iceberg tables based on sizes:

  • 0G to 16G (compact equality delete files fully every 1 hour)
  • 16G to 256G (compact equality delete files fully every 6 hours)
  • 256G to 4T (compact equality delete files fully every 24 hours)
  • 4T to 64T (compact equality delete files fully every week)
  • 64T to more (compact equality delete files gradually instead of fully because compaction improve performance percentage lesser as the table size goes up )

@Li0k Li0k changed the title perf: improve Auto compaction fallback to Full to reduce write-amp refactor: simplify auto compaction planning Mar 18, 2026
@Li0k Li0k force-pushed the li0k/improve_full_compaction branch from 1be6603 to d18e313 Compare March 18, 2026 09:46
@Li0k Li0k requested a review from Copilot March 18, 2026 10:23
@Li0k Li0k marked this pull request as ready for review March 18, 2026 10:23
@Li0k Li0k requested review from chenzl25 and xxhZs March 18, 2026 10:23
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors Auto compaction planning to be snapshot-local and repeat-safe by removing the previous Full fallback and limiting Auto to selecting between localized FilesWithDeletes and SmallFiles plan sets, with a planner-level plan-count budget.

Changes:

  • Add AutoPlanReport/AutoPlanReason/AutoSelectedStrategy and a new report-returning planning API for Auto.
  • Simplify auto thresholds and snapshot stats (rename to delete_heavy_files_count, drop impact-ratio + full-fallback config, add max_auto_plans_per_run).
  • Add a design/contract document describing compaction strategy boundaries and Auto planner semantics.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
docs/compaction-strategy-contract.md New contract doc describing strategy model, Auto selection order, and budget/return semantics.
core/src/file_selection/mod.rs Rename snapshot stat field to delete_heavy_files_count to match updated semantics.
core/src/config/mod.rs Remove full fallback + impact ratio from auto thresholds; add max_auto_plans_per_run; split auto “resolve” into candidate helpers and update tests.
core/src/compaction/mod.rs Re-export new Auto report/enum types from the compaction module.
core/src/compaction/auto.rs Implement new Auto planning flow that produces a report, selects delete-first then small-files, and caps by max plans.

Comment on lines +151 to +159
let delete_report = if let Some(planning_config) = delete_candidate {
Some(Self::build_report(
tasks.clone(),
planning_config,
to_branch,
snapshot_id,
total_data_bytes,
AutoPlanReason::Recommended,
)?)
Comment on lines +15 to +17
- `data file`: a `FileScanTask` where `data_file_content == Data`
- `delete-heavy`: `deletes.len() >= min_delete_file_count_threshold`
- `candidate set`: the set of data files that a strategy is allowed to include in compaction
Comment on lines +40 to +44
- Intended use: timely cleanup of delete-heavy files
- Candidate set: `deletes.len() >= min_delete_file_count_threshold`
- May use `group_filters` for group gating
- `Auto` does not rewrite or override caller-provided group gating for this strategy
- Must be fixed-point: rewritten delete-heavy input files should leave the candidate set in the newly committed snapshot
Comment on lines 582 to 596
#[test]
fn test_resolve_strategy_priority() {
fn test_files_with_deletes_candidate_priority() {
let config = AutoCompactionConfigBuilder::default()
.thresholds(AutoThresholds {
min_files_with_deletes_count: 3,
min_delete_heavy_files_count: 3,
min_small_files_count: 5,
min_impact_ratio: None,
})
.build()
.unwrap();

// Priority 1: FilesWithDeletes wins when both thresholds met
let stats = create_test_stats(10, 6, 4);
assert!(matches!(
config.resolve(&stats).unwrap(),
config.files_with_deletes_candidate(&stats).unwrap(),
CompactionPlanningConfig::FilesWithDeletes(_)
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors auto-compaction planning to be snapshot-local and fixed-order, removing the previous “fallback to Full” behavior and documenting the strategy contract.

Changes:

  • Add a compaction strategy contract design doc describing Auto selection semantics, fixed-point expectations, and budget behavior.
  • Update AutoCompactionConfig/thresholds to remove impact-ratio + full fallback, add max_auto_plans_per_run, and rename delete-related stats to “delete-heavy”.
  • Extend the AutoCompactionPlanner to return an AutoPlanReport (strategy, costs, reason) and apply the plan-count budget inside the planner.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
docs/compaction-strategy-contract.md New design doc specifying the simplified strategy model and planner budget semantics.
core/src/file_selection/mod.rs Renames snapshot delete stat to delete_heavy_files_count.
core/src/config/mod.rs Removes full fallback + impact ratio; introduces candidate helpers and max_auto_plans_per_run.
core/src/compaction/mod.rs Re-exports new auto-planning report types.
core/src/compaction/auto.rs Implements report-based planning, fixed-order strategy selection, and budget capping.

Comment on lines 110 to +123
/// Plans compaction for a table branch.
///
/// Returns empty vector if no files need compaction.
pub async fn plan_compaction_with_branch(
&self,
table: &Table,
to_branch: &str,
) -> Result<Vec<CompactionPlan>> {
let report = self
.plan_compaction_report_with_branch(table, to_branch)
.await?;

Ok(report.plans)
}
Comment on lines +151 to +168
let delete_report = if let Some(planning_config) = delete_candidate {
let report = Self::build_report(
tasks.clone(),
planning_config,
to_branch,
snapshot_id,
total_data_bytes,
AutoPlanReason::Recommended,
)?;
if report.plans.is_empty() {
Some(report)
} else {
return Ok(Self::cap_report_plans(
report,
total_data_bytes,
self.config.max_auto_plans_per_run,
));
}
Comment on lines +233 to +239
return AutoPlanReport {
selected_strategy: report.selected_strategy,
plans: vec![],
planned_input_bytes: 0,
planned_input_files: 0,
rewrite_ratio: 0.0,
reason: AutoPlanReason::BudgetCapped,
Comment on lines +545 to +556
if stats.small_files_count >= self.thresholds.min_small_files_count {
Some(CompactionPlanningConfig::SmallFiles(SmallFilesConfig {
target_file_size_bytes: self.target_file_size_bytes,
min_size_per_partition: self.min_size_per_partition,
max_file_count_per_partition: self.max_file_count_per_partition,
max_input_parallelism: self.max_input_parallelism,
max_output_parallelism: self.max_output_parallelism,
enable_heuristic_output_parallelism: self.enable_heuristic_output_parallelism,
small_file_threshold_bytes: self.small_file_threshold_bytes,
grouping_strategy: self.grouping_strategy.clone(),
}));
group_filters: self.group_filters.clone(),
}))
@Li0k Li0k requested a review from wcy-fdu March 18, 2026 12:48
@Li0k
Copy link
Copy Markdown
Collaborator Author

Li0k commented Mar 19, 2026

I think we should consider table total size to determine how long we should compact/eliminate the equality files, since the ultimate goal of compaction it to increase the query performance. There is a trade-off between query performance and compaction cost/resources. So we can categorize iceberg tables based on sizes:

  • 0G to 16G (compact equality delete files fully every 1 hour)
  • 16G to 256G (compact equality delete files fully every 6 hours)
  • 256G to 4T (compact equality delete files fully every 24 hours)
  • 4T to 64T (compact equality delete files fully every week)
  • 64T to more (compact equality delete files gradually instead of fully because compaction improve performance percentage lesser as the table size goes up )

Hi @chenzl25, Your suggestion is great. I’m thinking: is this also a kind of budget concept? Once we set a daily usable budget, would the fully compaction cycle become longer for tables of different sizes?

Copy link
Copy Markdown
Collaborator

@chenzl25 chenzl25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

@Li0k Li0k added this pull request to the merge queue Mar 30, 2026
Merged via the queue into main with commit 70237cc Mar 30, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants