Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add consistent hashing to the client #19

Closed
wants to merge 14 commits into from

Conversation

CodingAnarchy
Copy link

@CodingAnarchy CodingAnarchy commented Jul 17, 2024

This is a significant refactor to add a consistent ring hashing algorithm to the Client, so instead of a single connection to a Memcached node, it instead creates connections to a vector of node DSNs and manages those in a Ring struct.

I bumped the version of the the crate to 0.3.0 as this behavior is significantly incompatible with the existing Client implementation in how it is instantiated, as it expects a vector of DSNs instead of a single one. The existing Client methods have all been replicated in the new structure, though, so this will be relatively transparent except on creating the struct.

The Node class is simply the old Client class with a rename and an additional field to track the DSN in the struct.

I changed the tests to work against a mocked out TCP server, with a transparent fake of the Memcached protocol. This made it easier to test in the unit test whether the expected calls were being made, but comes with the downside of not validating against actual Memcached nodes. An integration test to do this additional testing will likely be necessary to add as a follow up.

Testing in a staging environment (or more than one, if possible) is highly recommended here.

@CodingAnarchy CodingAnarchy requested a review from a team as a code owner July 17, 2024 12:37
@CodingAnarchy CodingAnarchy requested review from makrisoft and mkcny and removed request for a team July 17, 2024 12:37
@makrisoft makrisoft requested a review from adampetro July 17, 2024 13:16
Copy link
Contributor

@adampetro adampetro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we considered leaving the existing Client and allowing the Ring to be used as an alternative that composes multiple Clients?

Cargo.toml Outdated Show resolved Hide resolved
src/lib.rs Outdated
Comment on lines 21 to 23
struct GetManyResult(Result<Vec<Value>, Error>);
struct KeyDumpResult<'a>(Result<Vec<MetadumpIter<'a>>, Error>);
struct StatsResult(Result<Vec<HashMap<String, String>>, Error>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reasoning behind using the newtype pattern instead of aliases? Most of the cases I can think of crates offering result types, they are typically aliases of std::result::Result, for example anyhow::Result and std::io::Result

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of newtype first. I'm not sure if a type alias will work for the implementation of Extend, etc. but I can try to see if it does here. I agree it would probably be cleaner than the newtype pattern if I can get it working. I'll take a look tomorrow when I have some time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might be able to do the equivalent of what you're doing already with collect through this FromIterator implementation

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adampetro I looked into this briefly, and the issue with a type alias works well for something like anyhow::Result, where the Error type is used to make the result less generic, it doesn't work here where we need to implement Extend for the stream::iter to be collected. When I swapped this newtype out for a type alias, the compiler (correctly) errors on applying a trait from outside my crate to a type from outside my crate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But do you need to implement Extend? Does this FromIterator implementation do what you need?

Copy link
Author

@CodingAnarchy CodingAnarchy Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extend<Self::Item> is a trait bound required by futures::stream::StreamExt#collect, but the standard Result iterator doesn't provide it (or Default, the other requirement) because of the generics on the type.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went this way with futures::stream in order to do multiple network calls in an async concurrent fashion. I understand the newtype pattern here does feel pretty clunky and strange, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see now, StreamExt::collect requires Extend.

Can we use a single generic type instead of the three? Something like:

struct CollectibleError<T>(Result<Vec<T>, Error>);

impl<T> Default for CollectibleError<T> {
    fn default() -> Self {
        Self(Ok(Vec::new()))
    }
}

impl<T> Extend<Result<T, Error>> for CollectibleError<T> {
    fn extend<I: IntoIterator<Item = Result<T, Error>>>(&mut self, iter: I) {
        if self.0.is_err() {
            // If an error has already occurred, don't bother continuing
            return;
        }

        for item in iter {
            match item {
                Ok(value) => self.0.as_mut().unwrap().push(value),
                Err(e) => {
                    self.0 = Err(e);
                    return;
                }
            }
        }
    }
}

impl<T> Extend<Result<Vec<T>, Error>> for CollectibleError<T> {
    fn extend<I: IntoIterator<Item = Result<Vec<T>, Error>>>(&mut self, iter: I) {
        if self.0.is_err() {
            // If an error has already occurred, don't bother continuing
            return;
        }

        for mut item in iter {
            match item {
                Ok(ref mut value) => self.0.as_mut().unwrap().append(value),
                Err(e) => {
                    self.0 = Err(e);
                    return;
                }
            }
        }
    }
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made it a generic type in e1a1403, and put it in its own src/collectible_result.rs file.

src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated
Comment on lines 253 to 268
#[ctor::ctor]
fn init() {
SERVER_ADDRESSES.iter().for_each(|dsn| {
let listener = TcpListener::bind(dsn).expect("Failed to bind listener");

// accept connections and process them serially in a background thread
std::thread::spawn(move || {
for stream in listener.incoming() {
let stream = stream.expect("Failed to accept connection");
std::thread::spawn(move || {
handle_connection(stream);
});
}
});
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels very integration-test-like. Should we move to tests/integration_test.rs or something like that?

src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/ring.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@nickamorim nickamorim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use maglev for consistent hashing instead? We already have a bulk of this implemented in a closed source client implementation that I can point you to in DM

@CodingAnarchy
Copy link
Author

@nickamorim I agree maglev would be an improvement over the more traditional hashing algorithm here, but I think it would require too much of a change for me to implement currently. The main goal here was to unblock us on our current service to be able to scale up past one node, and I personally don't have the time to do that refactor currently. Does it make sense as a follow-on that you or someone else could do that after this PR is merged/released?

@CodingAnarchy
Copy link
Author

CodingAnarchy commented Jul 18, 2024

Have we considered leaving the existing Client and allowing the Ring to be used as an alternative that composes multiple Clients?

I went with this rename approach because the Client in other programming languages (e.g. dalli in Ruby, memcached in NodeJS) uses this same abstraction and so that makes this more predictable for developers coming from other languages. I think we can still expose the Node struct for those who want a direct 1:1 connection to a server themselves.

src/lib.rs Outdated Show resolved Hide resolved
@nickamorim
Copy link
Contributor

Does it make sense as a follow-on that you or someone else could do that after this PR is merged/released?

Certainly

Copy link
Contributor

@dwdking dwdking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@dwdking dwdking self-requested a review July 19, 2024 15:53
@mkcny mkcny removed their request for review August 6, 2024 13:45
@CodingAnarchy CodingAnarchy deleted the add-consistent-hashing branch October 21, 2024 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants