Gem updates and scaling features by danielevans · Pull Request #11 · malomalo/resque-throttler

danielevans · 2020-03-27T22:24:30Z

Hi! 👋

We are using this gem in several large applications to prevent the exhaustion of third party rate limits. We have encountered a few issues due to the scale of our queues.

First, with the rate limit sufficiently high (our main example is at: 4000, per: 60) the amount of time spent garbage collecting the queue grows. For us it has reached 5s per attempt.

Second, given the size of the rate limits and the number of commands per check combined with a large number of resque boxes (same example has 25 boxes with 8 workers each) the pressure on our Redis CPU grew extreme.

Third, when a resque worker uncleanly exists in any way which prevents the execution of an ensure the uuid is left in the set and the rate limit is effectively permanently reduced by 1. This can happen during an out of memory kill or other abrupt kill of the resque worker.

To fix this we are:

Adding the capability to centralize the garbage collection to eliminate the worker slowdown and Redis CPU issues.
Adding an optional max_duration option which will cause the garbage collection of tasks which have gone on so long they they are considered dead.
Logs and resolves any situation where the redis hash is missing but the set still contains the uuid.

In addition this change unpegs the resque version and updates the tests to work with Resque 2.0, removes and .gitignores the Gemfile.lock which is conventional for Ruby gems.

README.md

lib/resque/throttler.rb

malomalo · 2020-03-27T23:49:25Z

I like where this is going, but I don't like having to bring up another process unless it's necessary.

Might a better option be to use a mutex with Redis using set(key, nx: true, ex: ?)?

If it gets the mutex it does the GC and then removes the mutex key, if not it continues assuming the queue has hit it's rate limit. It would only allow 1 GC per job queue, and only when it needs to be run.

I'm not sure what to set the expiration of the mutex to currently. It would probably be some function of the at option. My first guess is at/500 based on your results, that said it should be faster when only 1 client is GCing.

It wouldn't be proactively GC which I can see as a benefit, but for that situation we could have another Resque job that triggers on a schedule.

Thoughts?

danielevans · 2020-03-30T18:26:20Z

A distributed lock approach would probably have prevented this from ever causing problems and I like it in general, however knowing what's going on I still prefer a sidecar process for our case.

It is more predictable, gives proactive GC as well as the eases understanding and monitoring the process. It moves as much of the burden as possible out of the workers, allowing the workers to remain entirely dedicated to performing work.

The infrastructure around managing and monitoring processes already exists thanks to resque scheduler and the sidecar process is a much more simple approach.

And you are correct; we have already switched to a centralized process using a monkey-patched version and we immediately saw a ~25% drop in the GC time as well as an 80% drop in Redis CPU usage and a 60% drop in our resque worker CPU usage.

malomalo · 2020-04-04T20:54:55Z

Cool, I'll give this a spin sometime this week and hopefully get it into master soon after

parikshit223933 · 2024-12-02T02:03:55Z

Hey, I have refactored the logic and tested it on real production application. seems to be working fine with no issues.
Please check this out: #20

Some of the problems mentiond here are handled in this logic.

danielevans added 8 commits March 27, 2020 13:23

Get specs running in ruby 2.6

1e9768f

Add configuration to disable inline gc

dfef447

Add a rake task to perform centralized gc

35e8309

Add a max duration option for uncleanly killed jobs

d419ad4

Document new features

b723b71

Allow resque 2.0 and bump minor version

b097baa

Add missing hash test

1f1dae8

Add a note to shut off the inline gc

8f039f6

malomalo reviewed Mar 27, 2020

View reviewed changes

README.md Outdated Show resolved Hide resolved

lib/resque/throttler.rb Show resolved Hide resolved

Rename max_duration to job_timeout

074ca59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gem updates and scaling features#11

Gem updates and scaling features#11
danielevans wants to merge 9 commits intomalomalo:masterfrom
danielevans:feature/scaling

danielevans commented Mar 27, 2020

Uh oh!

Uh oh!

Uh oh!

malomalo commented Mar 27, 2020

Uh oh!

danielevans commented Mar 30, 2020

Uh oh!

malomalo commented Apr 4, 2020

Uh oh!

parikshit223933 commented Dec 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

danielevans commented Mar 27, 2020

Uh oh!

Uh oh!

Uh oh!

malomalo commented Mar 27, 2020

Uh oh!

danielevans commented Mar 30, 2020

Uh oh!

malomalo commented Apr 4, 2020

Uh oh!

parikshit223933 commented Dec 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants