memtable/skiplist: add a purpose-built skiplist #131

ajwerner · 2025-05-18T04:37:42Z

Fixes #95.

marvin-j97 · 2025-05-18T15:31:41Z

With the Borrow<Q> trait bound it's not possible to use InternalKeyRef that would allow to not build an InternalKey (which involves a heap allocation).
Need to use equivalent::Comparable instead.

See: #98

marvin-j97 · 2025-05-19T01:42:54Z

src/memtable/skiplist/arena.rs

+// for the crate to work correctly. Anything larger than that will work.
+//
+// TODO: Justify this size.
+const DEFAULT_BUFFER_SIZE: usize = (32 << 10) - size_of::<AtomicUsize>();


Need to play with this a bit - but should probably be much higher by default: 1 MB or so?

Yeah, it should be bigger than 32k, but 1MiB might be too big. The keys and values are not inline, it’s just the metadata. The questions I’d have are how expensive is allocating a new block, and how expensive is inserting into the skip map. My guess is that the alloc is not likely worse than 10us (it’s probably way less) and the inserts are ~100ns. If you can fit 1000 in here (if we say the average links is 32 and the key and value are each 32 bytes), then you’ll have spent at least 10x as long doing the inserting. In practice I think the mallocs even with zeroing is a lot cheaper. The benchmarks I was playing with don’t show much win above 256KiB.

marvin-j97 · 2025-05-19T01:44:20Z

src/memtable/skiplist/arena.rs

+unsafe impl<const N: usize> Send for Arenas<N> {}
+unsafe impl<const N: usize> Sync for Arenas<N> {}
+
+pub(crate) struct Arenas<const BUFFER_SIZE: usize = DEFAULT_BUFFER_SIZE> {


Eventually, for write transactions, the size should be much smaller (so that small transactions don't overallocate too much) - so this needs to be a non-generic parameter.

Okay, I can do that.

marvin-j97 · 2025-05-19T14:07:03Z

src/memtable/mod.rs

+        // TODO(ajwerner): Decide what we want to do here. The panic is sort of
+        // extreme, but also seems right given the invariants.


Write transactions write with a fixed sequence number initially, so we actually need to overwrite values to keep the same behaviour as crossbeam's skiplist

https://github.com/fjall-rs/fjall/blob/5001d4db6430808df4c8ba6db12c2dbaaf7a91ec/src/tx/write/mod.rs#L379-L381

I see. This is doable if we’re okay leaking memory for each write. If we’re not then we’ll need some sort of free list structure. I’ll take a stab. I’ve been thinking I was to change the memory layout so drop doesn’t have to bounce around through iteration.

ajwerner

I’ll play with updating. It’s pretty hard to not leak the node from the previous update without hooking up a free list but it’s also not so hard to add one.

ajwerner · 2025-05-20T00:19:53Z

src/memtable/mod.rs

+        // TODO(ajwerner): Decide what we want to do here. The panic is sort of
+        // extreme, but also seems right given the invariants.


I see. This is doable if we’re okay leaking memory for each write. If we’re not then we’ll need some sort of free list structure. I’ll take a stab. I’ve been thinking I was to change the memory layout so drop doesn’t have to bounce around through iteration.

ajwerner · 2025-05-20T00:21:28Z

src/memtable/skiplist/arena.rs

+unsafe impl<const N: usize> Send for Arenas<N> {}
+unsafe impl<const N: usize> Sync for Arenas<N> {}
+
+pub(crate) struct Arenas<const BUFFER_SIZE: usize = DEFAULT_BUFFER_SIZE> {


Okay, I can do that.

ajwerner · 2025-05-20T01:13:03Z

src/memtable/skiplist/arena.rs

+// for the crate to work correctly. Anything larger than that will work.
+//
+// TODO: Justify this size.
+const DEFAULT_BUFFER_SIZE: usize = (32 << 10) - size_of::<AtomicUsize>();


Yeah, it should be bigger than 32k, but 1MiB might be too big. The keys and values are not inline, it’s just the metadata. The questions I’d have are how expensive is allocating a new block, and how expensive is inserting into the skip map. My guess is that the alloc is not likely worse than 10us (it’s probably way less) and the inserts are ~100ns. If you can fit 1000 in here (if we say the average links is 32 and the key and value are each 32 bytes), then you’ll have spent at least 10x as long doing the inserting. In practice I think the mallocs even with zeroing is a lot cheaper. The benchmarks I was playing with don’t show much win above 256KiB.

memtable/skiplist: add a purpose-built skiplist

1f12a1a

ajwerner force-pushed the memtable-skiplist branch from 46acb25 to 1f12a1a Compare May 18, 2025 04:41

marvin-j97 added enhancement New feature or request performance type:memtable test labels May 18, 2025

marvin-j97 added 4 commits May 18, 2025 17:04

fmt

41783c9

remove crossbeam skiplist dep

b53b708

refactor skiplist

ee1046a

restore InternalKeyRef

b0cb95e

marvin-j97 added benchmark and removed benchmark labels May 18, 2025

marvin-j97 requested changes May 19, 2025

View reviewed changes

ajwerner commented May 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memtable/skiplist: add a purpose-built skiplist #131

memtable/skiplist: add a purpose-built skiplist #131

ajwerner commented May 18, 2025

marvin-j97 commented May 18, 2025 •

edited

Loading

marvin-j97 May 19, 2025

ajwerner May 20, 2025

marvin-j97 May 19, 2025

ajwerner May 20, 2025

marvin-j97 May 19, 2025

ajwerner May 20, 2025

ajwerner left a comment

ajwerner May 20, 2025

ajwerner May 20, 2025

ajwerner May 20, 2025

		// TODO(ajwerner): Decide what we want to do here. The panic is sort of
		// extreme, but also seems right given the invariants.

memtable/skiplist: add a purpose-built skiplist #131

Are you sure you want to change the base?

memtable/skiplist: add a purpose-built skiplist #131

Conversation

ajwerner commented May 18, 2025

marvin-j97 commented May 18, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marvin-j97 commented May 18, 2025 •

edited

Loading