DLO: Multi-objective optimization for auto-compaction #201

sumedhsakdeo · 2024-09-18T19:18:30Z

Summary

We plan to develop Auto Compaction for for high ROI tables. Our plan is to treat this as a multi-objective optimization problem by aiming to optimizing two objectives -- maximize file count reduction and minimize compute costs. We score and rank the tables, then choose the top-K tables for each iteration of compaction to remain under allocated compute budget.

This will be used by the job scheduler for candidate selection.

Changes

For all the boxes checked, please include additional details of the changes made in this pull request.

Testing Done

Manually Tested on local docker setup. Please include commands ran, and their output.
Added new tests for the changes made.
Updated existing tests to reflect the changes made.
No tests added or updated. Please explain why. If unsure, please feel free to ask for help.
Some other form of testing like staging or soak time in production. Please explain.

For all the boxes checked, include a detailed description of the testing done for the changes made in this pull request.

Additional Information

Breaking Changes
Deprecations
Large PR broken into smaller PRs, and PR plan linked in the description.

For all the boxes checked, include additional details of the changes made in this pull request.

…eduction

...yout/src/main/java/com/linkedin/openhouse/datalayout/ranker/DataLayoutCandidateSelector.java

...alayout/src/main/java/com/linkedin/openhouse/datalayout/ranker/DataLayoutStrategyScorer.java

...java/com/linkedin/openhouse/datalayout/ranker/SimpleWeightedSumDataLayoutStrategyScorer.java

.../src/main/java/com/linkedin/openhouse/datalayout/ranker/BaseDataLayoutCandidateSelector.java

.../src/main/java/com/linkedin/openhouse/datalayout/ranker/TopKDataLayoutCandidateSelector.java

...java/com/linkedin/openhouse/datalayout/ranker/SimpleWeightedSumDataLayoutStrategyScorer.java

...src/main/java/com/linkedin/openhouse/datalayout/ranker/GreedyMaxBudgetCandidateSelector.java

.../src/main/java/com/linkedin/openhouse/datalayout/ranker/BaseDataLayoutCandidateSelector.java

teamurko

lgtm, the only question is why returning indexes instead of strategy objects is important

sumedhsakdeo · 2024-09-19T21:39:13Z

lgtm, the only question is why returning indexes instead of strategy objects is important

I feel returning indexes is more intuitive, but I am fine if you think when you incorporate the module in scheduler you can tell it is not a a good idea and we go to list of strategy objects.

DLO: Solving multiple objective optimization problem for file count r…

4545eb7

…eduction

sumedhsakdeo requested review from jiang95-dev, teamurko, anjagruenheid and maluchari September 18, 2024 19:30

teamurko requested changes Sep 19, 2024

View reviewed changes

teamurko reviewed Sep 19, 2024

View reviewed changes

...java/com/linkedin/openhouse/datalayout/ranker/SimpleWeightedSumDataLayoutStrategyScorer.java Show resolved Hide resolved

anjagruenheid reviewed Sep 19, 2024

View reviewed changes

...java/com/linkedin/openhouse/datalayout/ranker/SimpleWeightedSumDataLayoutStrategyScorer.java Show resolved Hide resolved

...src/main/java/com/linkedin/openhouse/datalayout/ranker/GreedyMaxBudgetCandidateSelector.java Show resolved Hide resolved

Separate ScoredDataLayoutStrategy from raw traits in DataLayoutStrategy

deee98a

teamurko reviewed Sep 19, 2024

View reviewed changes

.../src/main/java/com/linkedin/openhouse/datalayout/ranker/BaseDataLayoutCandidateSelector.java Show resolved Hide resolved

teamurko approved these changes Sep 19, 2024

View reviewed changes

sumedhsakdeo merged commit 43275a8 into linkedin:main Sep 19, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DLO: Multi-objective optimization for auto-compaction #201

DLO: Multi-objective optimization for auto-compaction #201

sumedhsakdeo commented Sep 18, 2024 •

edited

Loading

teamurko left a comment

sumedhsakdeo commented Sep 19, 2024

DLO: Multi-objective optimization for auto-compaction #201

DLO: Multi-objective optimization for auto-compaction #201

Conversation

sumedhsakdeo commented Sep 18, 2024 • edited Loading

Summary

Changes

Testing Done

Additional Information

teamurko left a comment

Choose a reason for hiding this comment

sumedhsakdeo commented Sep 19, 2024

sumedhsakdeo commented Sep 18, 2024 •

edited

Loading