Skip to content

In the prototype, merge the union level into regular levels #709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mheinzel
Copy link
Collaborator

@mheinzel mheinzel commented May 7, 2025

Description

This leads to the union level being merged with runs from the regular levels at some point.

Comment on lines 798 to 799
-- Before adding the run to the regular levels, we check if we can get
-- rid of the union level (by moving it into into the regular ones).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so this is the approach where we try to put the completed union run into the levels whenever we flush a write buffer. What were the other alternatives? Could this not be implemented as part of supplyUnionCredits?

Copy link
Collaborator Author

@mheinzel mheinzel May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing it when flushing the buffer seems good, because this is where it really matters that the union level can be moved to the regular ones (when creating new last level merges). A union could get completed by an operation on another table due to sharing. Then it wouldn't get moved until you call supplyUnionCredits, which might never happen again.

I'll explain the alternatives and my reasoning in the comment.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, now I remember. So one such situation would be where we create an incremental union table, duplicate it, and then only supply credits to one of them. Would it work if the other duplicate (which did not supply credits) always has 1 credit remaining for the "moving back into the levels'?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, could be due to a duplicate or also using the table as an input to another union.

Would it work if the other duplicate (which did not supply credits) always has 1 credit remaining for the "moving back into the levels'?

That would help make it clear that supplyUnionCredits still should be called, but it still doesn't guarantee it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like if we rely on updates for the union level to be migrated, that it is also not fully guaranteed that it will migrate, or is it?

Comment on lines +1182 to +1222
-- Our representation doesn't allow for empty levels, so we can only put the
-- run directly after the pre-existing regular levels. If it is too large for
-- that, we don't want to move it yet to avoid violating run size invariants
-- and doing inefficient merges of runs with very different sizes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should make a TODO to allow empty levels? Or maybe having the Single vs. MigratedUnion distinction would help with thiss?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realised I originally wanted to do something even simpler: When the union is completed, always move it to a new level. If it is much larger than that level should be, the existing code will already handle, not creating a merge with it, but just pushing the oversized run down the levels over time, until it fits in and becomes part of a new last level merge. I think combined with the MigratedUnion constructor (to avoid watering down the invariant too much), that could be a decent solution. Kind of like allowing empty levels just before a MigratedUnion, but not explicitly representing them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, maybe we should have a dedicated test that triggers this particular behaviour so that we can check that it works correctly

-- nothing to do
return (ls, NoUnion)
migrateUnionLevel _tr _sc ls ul@(Union t _) =
-- TODO: tracing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something you still want to do in this PR?

@@ -176,29 +177,72 @@ test_merge_again_with_incoming =
-- properties
--

-- TODO: also generate nested unions?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 nesting at least once would potentially show some edge case behaviour

-- merge is completed and sufficient new entries have been inserted.
prop_union_merge_into_levels :: [[(LSM.Key, LSM.Op)]] -> Property
prop_union_merge_into_levels kopss = length (filter (not . null) kopss) > 1 QC.==>
QC.forAll arbitrary $ \firstPay ->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be intentional, but QC.forAll does not shrink, you'd have to use QC.forAllShrink

@mheinzel mheinzel force-pushed the mheinzel/prototype-union-merge-into-levels branch from ebaad0b to adf3720 Compare May 12, 2025 12:59
-- level can be factor (5/4) too large, and there the same holding
-- back can lead to factor (6/4) etc., until at level 12 a run is two
-- levels too large.
assertST $ all (\r -> runSize r > 0) rs
Copy link
Collaborator

@jorisdral jorisdral May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new assertion. Is it related to unions or just there to because it's useful?

Comment on lines +389 to +390
-- of the expected size range already, but it could also be smaller
-- if it comes from a union level.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And could it not also be larger if it came form a union level?

Comment on lines +1236 to +1247
| levellingRunSizeToLevel r <= length ls + 1 ->
-- If it fits into a hypothetical new last level, put it there.
--
-- TODO: In some cases it seems desirable to even add it to the
-- existing last regular level (so it becomes part of a merge
-- sooner), but that would lead to additional merging work that was
-- not accounted for. We'd need to be careful to ensure the merge
-- completes in time, without doing a lot of work in a short time.
(ls ++ [Level (Single MigratedUnion r) []], NoUnion)
_ ->
-- Otherwise, just leave it for now.
(ls, ul)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like we discussed elsewhere, it might still be useful to migrate the union level even if it doesn't fit

Comment on lines 798 to 799
-- Before adding the run to the regular levels, we check if we can get
-- rid of the union level (by moving it into into the regular ones).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like if we rely on updates for the union level to be migrated, that it is also not fully guaranteed that it will migrate, or is it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants