Skip to content

Commit e491590

Browse files
committed
Address review feedback
TODO: split and rebase into earlier commits
1 parent b9d3e1f commit e491590

File tree

7 files changed

+70
-14
lines changed

7 files changed

+70
-14
lines changed

bench/macro/lsm-tree-bench-wp8.hs

Lines changed: 14 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -180,19 +180,23 @@ mkTableConfigSetup GlobalOpts{diskCachePolicy} SetupOpts{bloomFilterAlloc} conf
180180
, LSM.confBloomFilterAlloc = bloomFilterAlloc
181181
}
182182

183-
mkTableConfigRun :: GlobalOpts -> LSM.TableConfig -> LSM.TableConfig
184-
mkTableConfigRun GlobalOpts{diskCachePolicy} conf = conf {
185-
LSM.confDiskCachePolicy = diskCachePolicy
183+
mkTableConfigRun :: GlobalOpts -> RunOpts -> LSM.TableConfig -> LSM.TableConfig
184+
mkTableConfigRun GlobalOpts{diskCachePolicy} RunOpts {pipelined} conf =
185+
conf {
186+
LSM.confDiskCachePolicy = diskCachePolicy,
187+
LSM.confMergeBatchSize = if pipelined
188+
then LSM.MergeBatchSize 1
189+
else LSM.confMergeBatchSize conf
186190
}
187191

188192
mkTableConfigOverride :: GlobalOpts -> RunOpts -> LSM.TableConfigOverride
189193
mkTableConfigOverride GlobalOpts{diskCachePolicy} RunOpts {pipelined} =
190-
LSM.noTableConfigOverride {
191-
LSM.overrideDiskCachePolicy = Just diskCachePolicy,
192-
LSM.overrideMergeBatchSize = if pipelined
193-
then Just (LSM.MergeBatchSize 1)
194-
else Nothing
195-
}
194+
LSM.noTableConfigOverride {
195+
LSM.overrideDiskCachePolicy = Just diskCachePolicy,
196+
LSM.overrideMergeBatchSize = if pipelined
197+
then Just (LSM.MergeBatchSize 1)
198+
else Nothing
199+
}
196200

197201
mkTracer :: GlobalOpts -> Tracer IO LSM.LSMTreeTrace
198202
mkTracer gopts
@@ -588,7 +592,7 @@ doRun gopts opts = do
588592
-- reference version starts with empty (as it's not practical or
589593
-- necessary for testing to load the whole snapshot).
590594
tbl <- if check opts
591-
then let conf = mkTableConfigRun gopts benchTableConfig
595+
then let conf = mkTableConfigRun gopts opts benchTableConfig
592596
in LSM.newTableWith @IO @K @V @B conf session
593597
else let conf = mkOverrideDiskCachePolicy gopts opts
594598
in LSM.openTableFromSnapshotWith @IO @K @V @B conf session name label

lsm-tree.cabal

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,12 @@ description:
183183
The /disk cache policy/ determines if lookup operations use the OS page cache.
184184
Caching may improve the performance of lookups and updates if database access follows certain patterns.
185185

186+
[@confMergeBatchSize@]
187+
The merge batch size balances the maximum latency of individual update
188+
operations, versus the latency of a sequence of update operations. Bigger
189+
batches improves overall performance but some updates will take a lot
190+
longer than others. The default is to use a large batch size.
191+
186192
==== Fine-tuning: Merge Policy, Size Ratio, and Write Buffer Size #fine_tuning_data_layout#
187193

188194
The configuration parameters @confMergePolicy@, @confSizeRatio@, and @confWriteBufferAlloc@ affect how the table organises its data.
@@ -429,6 +435,31 @@ description:
429435
* Use the @DiskCacheNone@ policy if the database's access pattern has does not have good spatial or temporal locality.
430436
For instance, if the access pattern is uniformly random.
431437

438+
==== Fine-tuning: Merge Batch Size #fine_tuning_merge_batch_size#
439+
440+
The /merge batch size/ is a micro-tuning parameter, and in most cases you do
441+
need to think about it and can leave it at its default.
442+
443+
When using the 'Incremental' merge schedule, merging is done in batches. This
444+
is a trade-off: larger batches tends to mean better overall performance but the
445+
downside is that while most updates (inserts, deletes, upserts) are fast, some
446+
are slower (when a batch of merging work has to be done).
447+
448+
If you care most about the maximum latency of updates, then use a small batch
449+
size. If you don't care about latency of individual operations, just the
450+
latency of the overall sequence of operations then use a large batch size. The
451+
default is to use a large batch size, the same size as the write buffer itself.
452+
The minimum batch size is 1. The maximum batch size is the size of the write
453+
buffer 'confWriteBufferAlloc'.
454+
455+
Note that the actual batch size is the minimum of this configuration
456+
parameter and the size of the batch of operations performed (e.g. 'inserts').
457+
So if you consistently use large batches, you can use a batch size of 1 and
458+
the merge batch size will always be determined by the operation batch size.
459+
460+
A further reason why it may be preferable to use minimal batch sizes is to get
461+
good parallel work balance, when using parallelism.
462+
432463
== References
433464

434465
The implementation of LSM-trees in this package draws inspiration from:

src/Database/LSMTree.hs

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,8 @@ module Database.LSMTree (
109109
confBloomFilterAlloc,
110110
confFencePointerIndex,
111111
confDiskCachePolicy,
112-
confMergeSchedule
112+
confMergeSchedule,
113+
confMergeBatchSize
113114
),
114115
defaultTableConfig,
115116
MergePolicy (LazyLevelling),

src/Database/LSMTree/Internal/Config.hs

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,12 @@ For a detailed discussion of fine-tuning the table configuration, see [Fine-tuni
9494
[@confDiskCachePolicy :: t'DiskCachePolicy'@]
9595
The /disk cache policy/ supports caching lookup operations using the OS page cache.
9696
Caching may improve the performance of lookups and updates if database access follows certain patterns.
97+
98+
[@confMergeBatchSize :: t'MergeBatchSize'@]
99+
The merge batch size balances the maximum latency of individual update
100+
operations, versus the latency of a sequence of update operations. Bigger
101+
batches improves overall performance but some updates will take a lot
102+
longer than others. The default is to use a large batch size.
97103
-}
98104
data TableConfig = TableConfig {
99105
confMergePolicy :: !MergePolicy
@@ -128,6 +134,8 @@ instance NFData TableConfig where
128134
-- OrdinaryIndex
129135
-- >>> confDiskCachePolicy defaultTableConfig
130136
-- DiskCacheAll
137+
-- >>> confMergeBatchSize defaultTableConfig
138+
-- MergeBatchSize 20000
131139
--
132140
defaultTableConfig :: TableConfig
133141
defaultTableConfig =
@@ -412,7 +420,8 @@ If you care most about the maximum latency of updates, then use a small batch
412420
size. If you don't care about latency of individual operations, just the
413421
latency of the overall sequence of operations then use a large batch size. The
414422
default is to use a large batch size, the same size as the write buffer itself.
415-
The minimum batch size is 1.
423+
The minimum batch size is 1. The maximum batch size is the size of the write
424+
buffer 'confWriteBufferAlloc'.
416425
417426
Note that the actual batch size is the minimum of this configuration
418427
parameter and the size of the batch of operations performed (e.g. 'inserts').
@@ -429,6 +438,14 @@ newtype MergeBatchSize = MergeBatchSize Int
429438
-- TODO: the thresholds for doing merge work should be different for each level,
430439
-- and ideally all-pairs co-prime.
431440
creditThresholdForLevel :: TableConfig -> LevelNo -> MR.CreditThreshold
432-
creditThresholdForLevel TableConfig { confMergeBatchSize = MergeBatchSize n }
441+
creditThresholdForLevel TableConfig {
442+
confMergeBatchSize = MergeBatchSize mergeBatchSz,
443+
confWriteBufferAlloc = AllocNumEntries writeBufferSz
444+
}
433445
(LevelNo _i) =
434-
MR.CreditThreshold (MR.UnspentCredits (MR.MergeCredits (max 1 n)))
446+
MR.CreditThreshold
447+
. MR.UnspentCredits
448+
. MR.MergeCredits
449+
. max 1
450+
. min writeBufferSz
451+
$ mergeBatchSz

test/Test/Database/LSMTree/Internal/Snapshot/Codec/Golden.hs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -153,6 +153,7 @@ forallSnapshotTypes f = [
153153
, f (Proxy @FencePointerIndexType)
154154
, f (Proxy @DiskCachePolicy)
155155
, f (Proxy @MergeSchedule)
156+
, f (Proxy @MergeBatchSize)
156157
-- SnapLevels
157158
, f (Proxy @(SnapLevels SnapshotRun))
158159
, f (Proxy @(SnapLevel SnapshotRun))
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
�

0 commit comments

Comments
 (0)