Skip to content

memtable/skiplist: add a purpose-built skiplist #131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

ajwerner
Copy link

Fixes #95.

@marvin-j97
Copy link
Contributor

marvin-j97 commented May 18, 2025

With the Borrow<Q> trait bound it's not possible to use InternalKeyRef that would allow to not build an InternalKey (which involves a heap allocation).
Need to use equivalent::Comparable instead.

See: #98

// for the crate to work correctly. Anything larger than that will work.
//
// TODO: Justify this size.
const DEFAULT_BUFFER_SIZE: usize = (32 << 10) - size_of::<AtomicUsize>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to play with this a bit - but should probably be much higher by default: 1 MB or so?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it should be bigger than 32k, but 1MiB might be too big. The keys and values are not inline, it’s just the metadata. The questions I’d have are how expensive is allocating a new block, and how expensive is inserting into the skip map. My guess is that the alloc is not likely worse than 10us (it’s probably way less) and the inserts are ~100ns. If you can fit 1000 in here (if we say the average links is 32 and the key and value are each 32 bytes), then you’ll have spent at least 10x as long doing the inserting. In practice I think the mallocs even with zeroing is a lot cheaper. The benchmarks I was playing with don’t show much win above 256KiB.

unsafe impl<const N: usize> Send for Arenas<N> {}
unsafe impl<const N: usize> Sync for Arenas<N> {}

pub(crate) struct Arenas<const BUFFER_SIZE: usize = DEFAULT_BUFFER_SIZE> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, for write transactions, the size should be much smaller (so that small transactions don't overallocate too much) - so this needs to be a non-generic parameter.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I can do that.

Comment on lines +137 to +138
// TODO(ajwerner): Decide what we want to do here. The panic is sort of
// extreme, but also seems right given the invariants.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write transactions write with a fixed sequence number initially, so we actually need to overwrite values to keep the same behaviour as crossbeam's skiplist

https://github.com/fjall-rs/fjall/blob/5001d4db6430808df4c8ba6db12c2dbaaf7a91ec/src/tx/write/mod.rs#L379-L381

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. This is doable if we’re okay leaking memory for each write. If we’re not then we’ll need some sort of free list structure. I’ll take a stab. I’ve been thinking I was to change the memory layout so drop doesn’t have to bounce around through iteration.

Copy link
Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll play with updating. It’s pretty hard to not leak the node from the previous update without hooking up a free list but it’s also not so hard to add one.

Comment on lines +137 to +138
// TODO(ajwerner): Decide what we want to do here. The panic is sort of
// extreme, but also seems right given the invariants.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. This is doable if we’re okay leaking memory for each write. If we’re not then we’ll need some sort of free list structure. I’ll take a stab. I’ve been thinking I was to change the memory layout so drop doesn’t have to bounce around through iteration.

unsafe impl<const N: usize> Send for Arenas<N> {}
unsafe impl<const N: usize> Sync for Arenas<N> {}

pub(crate) struct Arenas<const BUFFER_SIZE: usize = DEFAULT_BUFFER_SIZE> {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I can do that.

// for the crate to work correctly. Anything larger than that will work.
//
// TODO: Justify this size.
const DEFAULT_BUFFER_SIZE: usize = (32 << 10) - size_of::<AtomicUsize>();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it should be bigger than 32k, but 1MiB might be too big. The keys and values are not inline, it’s just the metadata. The questions I’d have are how expensive is allocating a new block, and how expensive is inserting into the skip map. My guess is that the alloc is not likely worse than 10us (it’s probably way less) and the inserts are ~100ns. If you can fit 1000 in here (if we say the average links is 32 and the key and value are each 32 bytes), then you’ll have spent at least 10x as long doing the inserting. In practice I think the mallocs even with zeroing is a lot cheaper. The benchmarks I was playing with don’t show much win above 256KiB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use skiplist tailored for LSM-tree
2 participants