-
Notifications
You must be signed in to change notification settings - Fork 107
Use arc swap for credentials cache #542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Instead of a new dependency a |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ankrgyl
| #[derive(Debug)] | ||
| pub(crate) struct TokenCache<T> { | ||
| cache: Mutex<Option<(TemporaryToken<T>, Instant)>>, | ||
| cache: ArcSwapOption<CacheEntry<T>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in general, unless we have benchmark results that show arc-swap is necessary, I am opposed to adding a new dependency
Did you try a RWLock before reaching for a new crate? I always worry about adding new crates like arcswap as I don't want to have to deal with a RUSTSEC report if/when it becomes abandoned.
I do see there are many other users
RWLocks would allow multiple concurrent readers but if you had a lot of writers you might still have contention. If the you find update contention is too much, you could change to use RWLock<Arc<..>> so that the lock only needs to be held to clone an Arc
I understand the docs for arc swap claims https://docs.rs/arc-swap/latest/arc_swap/
Better option would be to have RwLock<Arc>. Then one would lock, clone the Arc and unlock. This suffers from CPU-level contention (on the lock and on the reference count of the Arc) which makes it relatively slow. Depending on the implementation, an update may be blocked for arbitrary long time by a steady inflow of readers.
I would imagine the overhead of actually using the token (making an HTTP request) is pretty huge compared to getting a lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I just saw this after writing my other comment.
I would imagine the overhead of actually using the token (making an HTTP request) is pretty huge compared to getting a lock.
The problem with the previous design, which may not apply to an RwLock (and sure, I will benchmark it and report back), is that "waiting in line" for the mutex would become so expensive with a high number of concurrent requests (eg. HEAD requests with 8ms p50 latencies), that it actually overwhelmed tokio's worker threads and dominated the execution time (we saw p50 "HEAD" operation latency spike to 700ms, and realized the mutex was the root cause).
Let me run a benchmark with arc swap vs. RwLock and report back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me run a benchmark with arc swap vs. RwLock and report back
👍
It's true, but I figured the read:write ratio is SO high (basically, 100% reads within the TTL window), that I'd like to avoid the (relatively) higher cost of an RwLock for the majority case. Would it be useful to illustrate this with a benchmark, or are you categorically against adding new dependencies? |
I think a super read heavy workload will work well with
In my mind, the burden of evidence is much higher to add a new dependency. I know it sounds somewhat like a curmudgeon, but each new dependency adds some small (but real) additional maintenance overhead (and downstream work). Unless there is compelling demonstrated value to add a new dependency, we try and avoid it. Also, there are very few contributions removing dependencies for some reason 😆 so once we add one we are typically stuck with them. |
That's totally fine with me! I am only writing in to contribute in a helpful manner. If I staunchly feel that arc_swap is the right solution, I can easily run a fork. So I'm aligned to doing whatever you feel like is the right thing for the repo. Here are the benchmark numbers run on my macbook pro. What I saw in production was dramatically more pronounced (mutex vs. arc swap, not arc swap vs. rwlock), but I don't have much more bandwidth to investigate that in depth. I'm happy to update the code however you'd like based on these findings. |
What did you benchmark? Is this a micro benchmark for locking, or is it actually a workload running object_store requests? given the total request time reported is in ms I am guessing the actual workload you have |
Benchmark is here: https://github.com/apache/arrow-rs-object-store/pull/542/files#diff-4e18aa7d15e47cfe7440ad519403de83caed450f907bc533b9aa414ca7f9c7de. I simulated object store requests via a 1-20ms sleep. Which obviously is not perfect... You can run it with |
|
These are the results on my machine: So here |
|
Also looking at the code, you roughly have something like this: I don't see how an |
Correct. I explicitly did not test any credential refreshes and just wanted to illustrate the difference with read contention. It's not very hard to fill that in. And yes I stated above that this benchmark does not illustrate the difference as extremely as what I saw in production. Perhaps it's because in a real workload, we're doing a lot more on the runtime than just GET operations, and that accentuates the impact of additional polls. |
It might make sense to look into using a separate threadpool for CPU and IO work. For example, you can move all your object store work to a different threadpool (tokio runtime) using the Something we spent quite a long time at InfluxData was that io/network latencies increased substantially with highly concurrent workloads. We eventually tracked this down to using the same threadpool (tokio pool) for CPU and IO work -- doing so basically starves the IO of the CPU it needs to make progress in the TCP state machine , and it seems that then the tcp stack treats the system as being congested and slows down traffic. |
I appreciate it! We do a lot of this and are constantly optimizing it. I deployed the change on our end using I don't mind at all if you are uninterested in this contribution. I was mostly submitting it to pay-it-back and say thank you for this library. I'm very happy to just run our fork. Feel free to let me know what you'd like from this point onwards. |
|
FWIW: I don't question that
I don't think there's a CPU contention here and IMHO using a threadpool would be totally overkill. The code in |
Thank you -- we very much appreciate it I think we are trying to get to the bottom of what is going on / figure out the best solution. In my opinion switching from I feel like we are still trying to figure out how much benefit, if any, the new Arc swap library really brings so we can make a final judgement call about if the new dependency is worth it |
I personally have no issues with using an RWLock instead of arc swap (I agree, the evidence is not glaringly obvious from these micro benchmarks). |
Do you see a difference between |
I have not tested it yet. I can give it a go and report back |
Which issue does this PR close?
Closes #541.
What changes are included in this PR?
Use an arc swap to hold the cached credential, and acquire a mutex only if the credential's ttl is expired.
Are there any user-facing changes?
No