-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding storage-throttle module to address "over capacity" issues #2502
Conversation
zipkin-storage/elasticsearch/src/main/java/zipkin2/elasticsearch/internal/HttpBulkIndexer.java
Outdated
Show resolved
Hide resolved
zipkin-storage/elasticsearch/src/main/java/zipkin2/elasticsearch/internal/client/HttpCall.java
Outdated
Show resolved
Hide resolved
zipkin-storage/throttle/src/main/java/zipkin2/storage/throttle/ActuateThrottleMetrics.java
Outdated
Show resolved
Hide resolved
Thanks for the hard work here! |
NOTE: in memory storage needs to be changed to not perform work at assembly of the call. |
zipkin-collector/core/src/main/java/zipkin2/collector/Collector.java
Outdated
Show resolved
Hide resolved
|
||
// Make sure we throttle | ||
Future<V> future = executor.submit(() -> { | ||
try (AutoCloseable nameReverter = updateThreadName(call.toString())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and added naming to the Threads here (and in enqueue()) based on discussion from the workshop. Let me know if this doesn't work for some reason.
Rebased off remote/master and updated my licenses. Things should be good now unless I missed something. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the first time I reviewed this top to bottom. Thanks and I learned some things!
There is a fair amount of code with experiential notes in. There aren't enough tests for some of this, especially things like pool resizing we should have tests or notes on how people are going to resize with spring boot for example (in the readme). I made some comments about some code to remove or downsize a bit.
Functionality wise, this looks ok.. we need to up the coverage a bit and also do a multithreaded case .like pool of 10 consumers to prove throttling works under contension.. this could help smoke out any thread safety bugs.
Most importantly, I think the time polishing test/filling is worthwhile as the impl looks like something we'd want and can accept! Thanks!
zipkin-collector/core/src/test/java/zipkin2/collector/CollectorTest.java
Outdated
Show resolved
Hide resolved
zipkin-collector/core/src/test/java/zipkin2/collector/CollectorTest.java
Outdated
Show resolved
Hide resolved
zipkin-collector/core/src/main/java/zipkin2/collector/Collector.java
Outdated
Show resolved
Hide resolved
zipkin-server/src/main/java/zipkin2/server/internal/ConditionalOnThrottledStorage.java
Outdated
Show resolved
Hide resolved
zipkin-server/src/main/java/zipkin2/server/internal/throttle/ThrottledSpanConsumer.java
Outdated
Show resolved
Hide resolved
zipkin-server/src/main/java/zipkin2/server/internal/throttle/ThrottledStorageComponent.java
Outdated
Show resolved
Hide resolved
zipkin-server/src/test/java/zipkin2/server/internal/throttle/ThrottledCallTest.java
Outdated
Show resolved
Hide resolved
zipkin-server/src/main/java/zipkin2/server/internal/throttle/ThrottledCall.java
Outdated
Show resolved
Hide resolved
I will try to address these items early next week! The testing will be the hardest part because concurrency. But I'll see what I can do :) |
the size part is more beneficial to the main jar than this, but when folks
get used to writing in one repo it is easy to carry whichever habits from
one module to another.
the beginning of the story was about android and also JVM agents using our
jar, both sensitive to size. we have an issue somewhere where we shrunk our
size considerably just by not doing the private thing. in time this will be
moot as the server code will likely go kotlin.
|
if this needs to be done with a try finally helper then you can instead of
using direct autocloseable, make a type like we have in brave (Scope) which
extends and removes the exception type.
however, usually the way people do this is a wrapping callable or runnable
that does normal try/finally depending on what is going on.
if that isnt appropriate.. there are so few call sites to thread naming..
and it is already explicit.. a utility method that is just normally called
as opposed to allocating a closeable is another option.
ex String oldName = setCurrentThreadName(foo)
try {
...
} finally {
setCurrentThreadName(old)
}
…On Fri, Apr 26, 2019, 11:17 PM Logic-32 ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In
zipkin-server/src/main/java/zipkin2/server/internal/throttle/ThrottledCall.java
<#2502 (comment)>
:
> + }
+
+ call = delegate.get();
+
+ // Not using try-with-resources to avoid catching exceptions that occur anywhere outside of close()
+ AutoCloseable nameReverter = updateThreadName(call.toString());
+ try {
+ ThrottledCallback<V> throttleCallback = new ThrottledCallback<>(callback, limitListener);
+ call.enqueue(throttleCallback);
+
+ // Need to wait here since the delegate call will run asynchronously also.
+ // This ensures we don't exceed our throttle/queue limits.
+ throttleCallback.await();
+ } finally {
+ try {
+ nameReverter.close();
Unfortunately, AutoCloseable throws a checked exception so I kind of have
to do this unless I want to make a new interface/class to remove the
checked exception. Would you prefer I got rid of the AutoCloseable or
suppressed?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2502 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAPVV5Z5F3UWUCQA352ISLPSMMIJANCNFSM4HGYQFGQ>
.
|
A majority of the tests currently reside in ThrottledCall. Testing ThrottledStorageComponent is proving to be quite difficult due to how Netflix's concurrency-limit works. Reviewing some of their tests, it looks like a certain amount of "latency" is required for the limiter to function as expected. So I can't simply submit tasks that either pass/fail and expect the limit to go up/down. If the additional tests in ThrottledCall are not sufficient then I'd have to request we do some brainstorming of some kind. The only route I see to testing ThrottledStorageComponent more would involve some significant refactoring to inject mocks/etc. which I'm not sure I can dedicate the time to :( |
there is a release that will be cut today. if you are ok with it, I can take the polish stuff after the fact and get this in? otherwise it will wait until the next release which could be probably a few weeks from now |
Not a good idea to rush this, or refactoring in I think. Let's proceed with the release @zeagord and we can polish this up for the next one. This will also give folks a time to test it manually (with snapshot or jitpack) |
I agree on not rushing. This has drifted from what we're using in production ourselves so some additional test deployments would be appreciated :) Good news is that our queue size is at 8000 messages and we have seen zero heap issues! Drop rate is still consistently 2% or lower per day (previously was peaking at about 15%). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, some minor points
zipkin-server/src/main/java/zipkin2/server/internal/throttle/ActuateThrottleMetrics.java
Show resolved
Hide resolved
} catch (RuntimeException e) { | ||
limitListener.onIgnore(); | ||
throw e; // E.g. RejectedExecutionException | ||
} catch (Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
roger. I guess the fact this is catching Exception is what threw me... did you want to catch (RuntimeException|Error)
here or is something actually declaring exception throws.
zipkin-server/src/main/java/zipkin2/server/internal/throttle/ThrottledCall.java
Outdated
Show resolved
Hide resolved
zipkin-server/src/main/java/zipkin2/server/internal/throttle/ThrottledSpanConsumer.java
Outdated
Show resolved
Hide resolved
zipkin-server/src/main/java/zipkin2/server/internal/throttle/ThrottledStorageComponent.java
Show resolved
Hide resolved
zipkin-server/src/main/java/zipkin2/server/internal/throttle/ThrottledStorageComponent.java
Outdated
Show resolved
Hide resolved
zipkin-server/src/test/java/zipkin2/server/internal/throttle/FakeCall.java
Outdated
Show resolved
Hide resolved
PS sorry about the CI meltdown. As of now, if you rebase, it should pass tests and not die on timeouts anymore. |
10cc503
to
53837ab
Compare
rebased and force pushed to get a green build so that folks can test this (especially with elasticsearch 7). Will post instructions shortly |
zipkin-server/src/test/java/zipkin2/server/internal/throttle/ThrottledStorageComponentTest.java
Outdated
Show resolved
Hide resolved
updated the description with how to test this. |
Thanks! If I can't get to the additional feedback tomorrow I'll get to it early next week. Sorry for being sluggish but at least it gives people time to test :) |
hi. I dont think delaying more is good because it is distracting and
subject to drift which effects us more than the minor points mentioned.
since this is disabled by default and mentioned as experimental I think it
is better to not let best be the enemy of good.
I will merge this so folks can test it in 2.14. thanks for the help
championing this Craig!
…On Fri, May 10, 2019, 3:35 AM Logic-32 ***@***.***> wrote:
Thanks! If I can't get to the additional feedback tomorrow I'll get to it
early next week. Sorry for being sluggish but at least it gives people time
to test :)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2502 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAPVV4OKRJOEKWXULY5BPTPUR4GNANCNFSM4HGYQFGQ>
.
|
Adding ThrottledStorageComponent/etc. to contain logic for wrapping other storage implementations and limiting the number of requests that can go through to them at a given time. Elasticsearch storage's maxRequests can be override by throttle properties if the throttle is enabled. Inspired by work done on openzipkin#2169.
53837ab
to
cfdccf0
Compare
Before this, there was some extra code in the throttle package handling a bug in our in memory storage. This fixes that and removes the extra code. See #2502
Before this, there was some extra code in the throttle package handling a bug in our in memory storage. This fixes that and removes the extra code. See #2502
@Value("${zipkin.storage.throttle.maxConcurrency:200}") int throttleMaxConcurrency) { | ||
ZipkinElasticsearchStorageProperties( | ||
@Value("${zipkin.storage.throttle.enabled:false}") boolean throttleEnabled, | ||
@Value("${zipkin.storage.throttle.maxConcurrency:200}") int throttleMaxConcurrency) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does maxConcurrency
need updated to max-concurrency
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't be required, but it is nice to do this. good catch
(ps I'm knackered so will deal with the other PR and any comments in the morning)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kk. Thank you for finishing this up and sorry again for the lag on my end! Work got in the way :(
What you did was great, and in fact I found some problems in how many of
our implementations implement call as a result of this, so double win!
cheers
…On Sat, May 11, 2019 at 12:02 AM Logic-32 ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In
zipkin-server/src/main/java/zipkin2/server/internal/elasticsearch/ZipkinElasticsearchStorageProperties.java
<#2502 (comment)>
:
> @@ -64,8 +64,9 @@
*/
private int timeout = 10_000;
- public ***@***.***("${zipkin.storage.throttle.enabled:false}") boolean throttleEnabled,
- @value("${zipkin.storage.throttle.maxConcurrency:200}") int throttleMaxConcurrency) {
+ ZipkinElasticsearchStorageProperties(
+ @value("${zipkin.storage.throttle.enabled:false}") boolean throttleEnabled,
+ @value("${zipkin.storage.throttle.maxConcurrency:200}") int throttleMaxConcurrency) {
Kk. Thank you for finishing this up and sorry again for the lag on my end!
Work got in the way :(
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#2502 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAPVV5JF2LL2N455DWF2XDPUWL75ANCNFSM4HGYQFGQ>
.
|
Before this, there was some extra code in the throttle package handling a bug in our in memory storage. This fixes that and removes the extra code. See #2502
…nzipkin#2502) Adding ThrottledStorageComponent/etc. to contain logic for wrapping other storage implementations and limiting the number of requests that can go through to them at a given time. Elasticsearch storage's maxRequests can be override by throttle properties if the throttle is enabled. Inspired by work done on openzipkin#2169.
Before this, there was some extra code in the throttle package handling a bug in our in memory storage. This fixes that and removes the extra code. See openzipkin#2502
This feature allows the server to automatically queue and back off when resource-related exceptions occur.
To test this, take the most recent build from jitpack like so and add the throttle variable.
$ curl -sSL https://jitpack.io/com/github/Logic-32/zipkin/zipkin-server/issue-2481-SNAPSHOT/zipkin-server-issue-2481-SNAPSHOT-exec.jar > zipkin.jar $ STORAGE_THROTTLE_ENABLED=true STORAGE_TYPE=elasticsearch java -jar zipkin.jar
Configuration
These settings can be used to help tune the rate at which Zipkin flushes data to another, underlying
StorageComponent
(such as Elasticsearch):Change Details:
Adding storage-throttle module/etc. to contain logic for wrapping other storage implementations and limiting the number of requests that can go through to them at a given time.
Elasticsearch storage's maxRequests can be override by throttle properties if the throttle is enabled.
Making sure RejectedExecutionExceptions are "first class" citizens since they are used to reduce the throttle.
Removing HttpCall's Semaphore in favor of the throttle (same purpose, different implementations).
Inspired by work done on #2169.
Fixes #2481