-
Notifications
You must be signed in to change notification settings - Fork 8
In the prototype, merge the union level into regular levels #709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
prototypes/ScheduledMerges.hs
Outdated
-- Before adding the run to the regular levels, we check if we can get | ||
-- rid of the union level (by moving it into into the regular ones). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so this is the approach where we try to put the completed union run into the levels whenever we flush a write buffer. What were the other alternatives? Could this not be implemented as part of supplyUnionCredits
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing it when flushing the buffer seems good, because this is where it really matters that the union level can be moved to the regular ones (when creating new last level merges). A union could get completed by an operation on another table due to sharing. Then it wouldn't get moved until you call supplyUnionCredits
, which might never happen again.
I'll explain the alternatives and my reasoning in the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, now I remember. So one such situation would be where we create an incremental union table, duplicate it, and then only supply credits to one of them. Would it work if the other duplicate (which did not supply credits) always has 1 credit remaining for the "moving back into the levels'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, could be due to a duplicate or also using the table as an input to another union.
Would it work if the other duplicate (which did not supply credits) always has 1 credit remaining for the "moving back into the levels'?
That would help make it clear that supplyUnionCredits
still should be called, but it still doesn't guarantee it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like if we rely on updates
for the union level to be migrated, that it is also not fully guaranteed that it will migrate, or is it?
-- Our representation doesn't allow for empty levels, so we can only put the | ||
-- run directly after the pre-existing regular levels. If it is too large for | ||
-- that, we don't want to move it yet to avoid violating run size invariants | ||
-- and doing inefficient merges of runs with very different sizes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should make a TODO to allow empty levels? Or maybe having the Single
vs. MigratedUnion
distinction would help with thiss?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realised I originally wanted to do something even simpler: When the union is completed, always move it to a new level. If it is much larger than that level should be, the existing code will already handle, not creating a merge with it, but just pushing the oversized run down the levels over time, until it fits in and becomes part of a new last level merge. I think combined with the MigratedUnion
constructor (to avoid watering down the invariant too much), that could be a decent solution. Kind of like allowing empty levels just before a MigratedUnion
, but not explicitly representing them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, maybe we should have a dedicated test that triggers this particular behaviour so that we can check that it works correctly
-- nothing to do | ||
return (ls, NoUnion) | ||
migrateUnionLevel _tr _sc ls ul@(Union t _) = | ||
-- TODO: tracing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something you still want to do in this PR?
@@ -176,29 +177,72 @@ test_merge_again_with_incoming = | |||
-- properties | |||
-- | |||
|
|||
-- TODO: also generate nested unions? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 nesting at least once would potentially show some edge case behaviour
-- merge is completed and sufficient new entries have been inserted. | ||
prop_union_merge_into_levels :: [[(LSM.Key, LSM.Op)]] -> Property | ||
prop_union_merge_into_levels kopss = length (filter (not . null) kopss) > 1 QC.==> | ||
QC.forAll arbitrary $ \firstPay -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be intentional, but QC.forAll
does not shrink, you'd have to use QC.forAllShrink
ebaad0b
to
adf3720
Compare
-- level can be factor (5/4) too large, and there the same holding | ||
-- back can lead to factor (6/4) etc., until at level 12 a run is two | ||
-- levels too large. | ||
assertST $ all (\r -> runSize r > 0) rs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a new assertion. Is it related to unions or just there to because it's useful?
-- of the expected size range already, but it could also be smaller | ||
-- if it comes from a union level. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And could it not also be larger if it came form a union level?
| levellingRunSizeToLevel r <= length ls + 1 -> | ||
-- If it fits into a hypothetical new last level, put it there. | ||
-- | ||
-- TODO: In some cases it seems desirable to even add it to the | ||
-- existing last regular level (so it becomes part of a merge | ||
-- sooner), but that would lead to additional merging work that was | ||
-- not accounted for. We'd need to be careful to ensure the merge | ||
-- completes in time, without doing a lot of work in a short time. | ||
(ls ++ [Level (Single MigratedUnion r) []], NoUnion) | ||
_ -> | ||
-- Otherwise, just leave it for now. | ||
(ls, ul) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like we discussed elsewhere, it might still be useful to migrate the union level even if it doesn't fit
prototypes/ScheduledMerges.hs
Outdated
-- Before adding the run to the regular levels, we check if we can get | ||
-- rid of the union level (by moving it into into the regular ones). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like if we rely on updates
for the union level to be migrated, that it is also not fully guaranteed that it will migrate, or is it?
Description
This leads to the union level being merged with runs from the regular levels at some point.