Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrency limiter from Netflix to avoid throttling storage #2169

Closed

Conversation

malonso1976
Copy link

No description provided.

Copy link
Member

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for giving this a go.

first question: have you tried this successfully?

second comment: I think the real constraint is storage so this should be a property at global level not nested for each collector. otherwise your constraints won't apply if both http and Kafka spans are accepted.

we can work through other design things after consideration of above 2 things.

@malonso1976
Copy link
Author

Hi Adrian,
I just made a initial unit test (removed from code now) to see if ConcurrencyLimiter worked as expected. I want to test it in one of our environments under load to see if it works for us.
For your second comment, I did not take into consideration http collector... my fault. In our setup we do not received spans directly through http endpoint.

Do you have any design concern how shall concurrency limiter applied on http collector? The http collector is a special case, because http client knows synchronously whether collection succeeded or not, and current ConcurrentLimiter works as a pool of threads, processing requests asynchronously.

@malonso1976
Copy link
Author

Hi again,
just tried to move up the configuration and shared the concurrency limiter with the http collector. Please see if it is closer to what you suggested and I will try to test it in a real environment as soon as possible.

Regards,

@codefromthecrypt
Copy link
Member

yeap on this being where I was hoping it would be (will apply to all collectors, not just http but good start)

once you have your results in, we can chat further. thanks so far.

@shakuzen
Copy link
Member

@malonso1976 were you able to try this out in a real environment?

@malonso1976
Copy link
Author

malonso1976 commented Oct 24, 2018 via email

@shakuzen
Copy link
Member

@malonso1976 No problem at all. Let us know when you have a chance to test it out.

@@ -64,6 +64,7 @@
<!-- Be careful to set this as a provided dep, so that it doesn't interfere with dependencies
from other projects. For example, cassandra and spring boot set guava versions -->
<guava.version>19.0</guava.version>
<netflix.concurrency.limits.version>0.0.49</netflix.concurrency.limits.version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adriancole, concurrency-limits v0.2.0 is the latest tag but uses Java 8 features (such as Optional). Do you know of any current plans to get io.zipkin.zipkin2:zipkin off of 1.6 and up to at least 1.8? I saw #777 but it doesn't appear to have actually upped the main.signature.artifact.

It also looks like retrolambda does not support recompiling dependencies or I'd explore that option further.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, made a new artifact outside of zipkin-core so it's not subject to this limitation. Affords me some other luxuries around not-shading dependencies as well :)

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Apr 9, 2019 via email

Logic-32 added a commit to Logic-32/zipkin that referenced this pull request Apr 17, 2019
Adding storage-throttle module/etc. to contain logic for wrapping other storage implementations and limiting the number of requests that can go through to them at a given time.
Elasticsearch storage's maxRequests can be override by throttle properties if the throttle is enabled.
Making sure RejectedExecutionExceptions are "first class" citizens since they are used to reduce the throttle.
Removing HttpCall's Semaphore in favor of the throttle (same purpose, different implementations).
Inspired by work done on openzipkin#2169.
@codefromthecrypt
Copy link
Member

#2502 will address this use case. Thanks

Logic-32 added a commit to Logic-32/zipkin that referenced this pull request Apr 24, 2019
Adding storage-throttle module/etc. to contain logic for wrapping other storage implementations and limiting the number of requests that can go through to them at a given time.
Elasticsearch storage's maxRequests can be override by throttle properties if the throttle is enabled.
Inspired by work done on openzipkin#2169.
Logic-32 added a commit to Logic-32/zipkin that referenced this pull request May 3, 2019
Adding ThrottledStorageComponent/etc. to contain logic for wrapping other storage implementations and limiting the number of requests that can go through to them at a given time.
Elasticsearch storage's maxRequests can be override by throttle properties if the throttle is enabled.
Inspired by work done on openzipkin#2169.
codefromthecrypt pushed a commit to Logic-32/zipkin that referenced this pull request May 8, 2019
Adding ThrottledStorageComponent/etc. to contain logic for wrapping other storage implementations and limiting the number of requests that can go through to them at a given time.
Elasticsearch storage's maxRequests can be override by throttle properties if the throttle is enabled.
Inspired by work done on openzipkin#2169.
@codefromthecrypt
Copy link
Member

#2502 now includes test instructions. please give a try!

codefromthecrypt pushed a commit to Logic-32/zipkin that referenced this pull request May 10, 2019
Adding ThrottledStorageComponent/etc. to contain logic for wrapping other storage implementations and limiting the number of requests that can go through to them at a given time.
Elasticsearch storage's maxRequests can be override by throttle properties if the throttle is enabled.
Inspired by work done on openzipkin#2169.
codefromthecrypt pushed a commit that referenced this pull request May 10, 2019
Adding ThrottledStorageComponent/etc. to contain logic for wrapping other storage implementations and limiting the number of requests that can go through to them at a given time.

Elasticsearch storage's maxRequests can be override by throttle properties if the throttle is 
enabled.

Inspired by work done on #2169.
abesto pushed a commit to abesto/zipkin that referenced this pull request Sep 10, 2019
…nzipkin#2502)

Adding ThrottledStorageComponent/etc. to contain logic for wrapping other storage implementations and limiting the number of requests that can go through to them at a given time.

Elasticsearch storage's maxRequests can be override by throttle properties if the throttle is 
enabled.

Inspired by work done on openzipkin#2169.
@codefromthecrypt
Copy link
Member

note: concurrency-limits-core has been abandoned by Netflix by over a year at this point. We may have to revisit this

@codefromthecrypt
Copy link
Member

I opened this noticing no release since last july even if there were a couple more months of changes, nothing was released Netflix/concurrency-limits#157

@anuraaga
Copy link
Contributor

I've heard the word resilience4j lately - never used it but just a reference

@Logic-32
Copy link
Contributor

Logic-32 commented Aug 16, 2020

Reslience4j and Failsafe are interesting options but, when evaluating them, make sure they solve the problem this is looking to address.

Namely, storage gets backed up so we need to slow down how much we're writing to it but do so in a way that doesn't cause the server to exhaust memory. In this particular case, a limited-size queue is used to buffer some data while we have a small bottleneck we're working with. Some simple retries with an appropriate backoff policy could address the same basic problem of slow storage but may be harder to tame with respect to pending actions. Meaning, I can't say how hard it is to prevent an "infinite queue" with the retry handlers.

FWIW, the ExecutorService that backs this implementation is pretty universal and supported by Java. The only part that is not is the one that adjusts how many threads are allowed to be active in it. To a certain extent, that logic could be directly incorporated into Zipkin instead of pulled in via Netflix.

@anuraaga
Copy link
Contributor

anuraaga commented Aug 17, 2020

@Logic-32 Thanks for summarizing the problem, it'll help if we look into any migration! I was indeed incorrectly thinking the RateLimiter in resilience4j would be relevant to us, but based on your explanation, my reading is that their bulkhead may provide what we need.

https://github.com/resilience4j/resilience4j#632-threadpoolbulkhead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants