-
Notifications
You must be signed in to change notification settings - Fork 198
Adds batch processing support #590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hey @mhenrixon, thanks for this! I haven't looked at it yet, but there's a previous PR for batch support: #142 and a discussion there. It's work in progress as well, but maybe the discussion there is relevant for this, too. |
This builds on some work that I did for Sidekiq actually. I had to put a database in front of sidekiq to make it work like I wanted and to ensure that no duplicates are processed while some stuff needs to happen at the end.
Hey @rosa, so I had a look at the other PR! Admittedly, I am biased, but I feel that my suggestion is more in line with the code I already saw in solid_queue. I don't care either way as long as the functionality ends up in a release in the not so distant future, though. Would @jpcamara be open to share notes and get the feature across the finish line? I am more than happy to support you in finishing up the work in your branch. |
@mhenrixon would be good to change the API to support batch enqueue using a block also so that:
Which would support a use case we have where we have an Job enqueue using a block is possible in #142 |
I'll continue the work this weekend. Has been super busy developing a Hotwire native app at work. Thanks for letting me know about this. |
I have some free time this coming week to put together some notes and figure out next steps. I think neither of our PRs fully matches the solid queue model - the solid queue model has core tables and then execution models. Both of our PRs kind of dump everything into one hot path table. Maybe that's ok, I'm not sure yet. I want to re-evaluate my approach and line it up better with the overall architecture of SQ. I'd be happy to try to work together on this |
Gonna dig into this PR as well and your approach, I've only reviewed it a bit so far |
Let me know if you want to team up on this. It is a much-needed feature. |
@jpcamara @mhenrixon did you connect on this, or find a new direction? I'm keen to support in any way. |
* Thanks to Mikael Henriksson for his work in rails#590. His work decentralizes management of batch status by moving it to the BatchUpdateJob, and tracking status using counts rather than querying specific job statuses after the fact. This is a much simpler approach to tracking the jobs, and allows us to avoid a constantly polling set of queries in the dispatcher. Also add in arbitrary metadata to allow tracking data from start to end of execution. This also means enqueueing a BatchUpdateJob based on callbacks in two different kinds of Batchable, which are included when a job is updated and finished, or when a FailedExecution is created (since failed jobs never "finish"). * This batch feature already took some inspiration from the GoodJob batch implementation (https://github.com/bensheldon/good_job). But now we also increase that by adopting some of the buffering and abstractions in a similar form as GoodJob. To discourage heavy reliance on the JobBatch model, it has been renamed to BatchRecord, and a separate Batch interface is how you interact with batches, with some delegation to the core model. * A new Buffer class (also modeled after GoodJob) was added specifically for batches. This was primarily added to support enqueue_after_transaction_commit. We now override the ActiveJob #enqueue method so we can keep track of which jobs are attempting to enqueue. When enqueue_after_transaction_commit is on, those jobs do not enqueue until all transactions commit. By tracking them at the high level enqueue and keeping a buffer of jobs, we can ensure that the jobs get tracked even when their creation is deferred until the transaction is committed. The side benefit is that we get to enqueue all the jobs together, probably offering some performance advantage. This buffer also keeps track of child batches for the same reason. * To support triggering a callback/BatchUpdateJob when a job finishes, the update to finished_at needed to become an update! call * As a simplification, on_failure is now only fired after all jobs finish, rather than at the first time a job fails * The adapter logic itself also needed to be updated to support the buffer and enqueue_after_transaction_commit. If a job is coming from a batch enqueue, we ignore it here and allow the batching process to enqueue_all at the end of the enqueue block. If the job is originally from a batch, but is retrying, we make sure the job counts in the batch stay updated. I don't love this addition, since it adds alot of complication to the adapter code, all solely oriented around batches * Batches benefit from keeping jobs until the batch has finished. As such, we ignore the preserve jobs setting, but if it is set to false, we enqueue a cleanup job once the batch has finished and clear out finished jobs Co-authored-by: Mikael Henriksson <[email protected]>
I'm closing this now that my ideas have been incorporate into #142 |
This builds on some work that I did for Sidekiq.
I had to put a database in front of Sidekiq to make it work as I wanted and to ensure that no duplicates are processed, while some tasks need to occur at the end.
There are several factors to consider, including the tracking of pending jobs. This might be better as a simple query, depending on how fast the jobs are processed; it could cause side effects with concurrent increments.
This is intended to initiate a discussion. There are other ways of handling this, but none that I like. This is the only approach that I can think of that doesn't have too many negatives.
The only negative is the database changes.