Skip to content

Conversation

jpcamara
Copy link
Contributor

@jpcamara jpcamara commented Feb 2, 2024

Support for batches in SolidQueue!

Batches are a powerful feature in Sidekiq Pro and GoodJob which help with job coordination. A "Batch" is a collection of jobs, and when those jobs meet certain completion criteria it can optionally trigger another job with the batch record as an argument.

The goal of this feature is to:

  1. Enhance job coordination
  2. Maintain the overall simplicity of SolidQueue. To quote @rosa in the batch support issue:

We have batch support in our list of possible features to add, but it's not in the immediate plans because it's a bit at odds with the simplicity we're aiming for... I'm not quite sure yet how it could look like, in a way that maintains the overall simplicity of the gem

This PR provides a functional batch implementation. The following scenarios will work:

# Create a job to run as part of the batch
class SleepyJob < ApplicationJob
  queue_as :background

  def perform(seconds_to_sleep)
    Rails.logger.info "Feeling #{seconds_to_sleep} seconds sleepy..."
    sleep seconds_to_sleep
  end
end

# Create a batch completion job - the first argument is always the batch record itself
class BatchCompletionJob < ApplicationJob
  queue_as :background

  def perform(batch)
    Rails.logger.info "#{batch.jobs.size} jobs completed!"
  end
end

# Create the batch itself. There are three callback options: `on_finish`, `on_success` and `on_failure`
SolidQueue::Batch.enqueue(on_success: BatchCompletionJob) do
  5.times.map { |i| SleepyJob.perform_later(i) }
end

You should see the following in your logs:

[SolidQueue] Claimed 5 jobs
Performing SleepyJob (Job ID: 9a97e394-f1b4-47b5-8db3-6df86faf0913) from SolidQueue(background) enqueued at 2024-02-02T02:02:05.563284000Z with arguments: 1
Feeling 1 seconds sleepy...
Performing SleepyJob (Job ID: 4ae76794-25e1-4378-bbc7-c505d7775395) from SolidQueue(background) enqueued at 2024-02-02T02:02:05.567448000Z with arguments: 2
Feeling 2 seconds sleepy...
Performing SleepyJob (Job ID: fd41b240-08b6-4f7d-b4de-38ee2e6a58b4) from SolidQueue(background) enqueued at 2024-02-02T02:02:05.557246000Z with arguments: 0
Feeling 0 seconds sleepy...
Performed SleepyJob (Job ID: fd41b240-08b6-4f7d-b4de-38ee2e6a58b4) from SolidQueue(background) in 0.1ms
Performing SleepyJob (Job ID: 40498973-9b5a-4f57-acce-03e2b8e2a97c) from SolidQueue(background) enqueued at 2024-02-02T02:02:05.594398000Z with arguments: 4
Feeling 4 seconds sleepy...
Performing SleepyJob (Job ID: c2a123ea-61ba-4d33-99e7-9478a8ca4f5d) from SolidQueue(background) enqueued at 2024-02-02T02:02:05.589439000Z with arguments: 3
Feeling 3 seconds sleepy...
Performed SleepyJob (Job ID: 9a97e394-f1b4-47b5-8db3-6df86faf0913) from SolidQueue(background) in 1005.47ms
Performed SleepyJob (Job ID: 4ae76794-25e1-4378-bbc7-c505d7775395) from SolidQueue(background) in 2006.58ms
Performed SleepyJob (Job ID: c2a123ea-61ba-4d33-99e7-9478a8ca4f5d) from SolidQueue(background) in 3006.1ms
Performed SleepyJob (Job ID: 40498973-9b5a-4f57-acce-03e2b8e2a97c) from SolidQueue(background) in 4008.12ms
[SolidQueue] Claimed 1 jobs
Performing BatchCompletionJob (Job ID: 625ca70c-4e80-4789-988b-3950ed1b23f3) from SolidQueue(background) enqueued at 2024-02-02T02:02:10.183075000Z with arguments: #<GlobalID:0x000000010b5b0390 @uri=#<URI::GID gid://dummy/SolidQueue::JobBatch/13>>
5 jobs completed!
Performed BatchCompletionJob (Job ID: 625ca70c-4e80-4789-988b-3950ed1b23f3) from SolidQueue(background) in 7.97ms

Here is the full interface, demonstrating a few different options:

# All three callback types
# on_finish: runs after all jobs have finished running, including retries. It runs regardless of failed jobs
# on_success: runs after all jobs have finished running, including retries. Only runs if all jobs succeed
# on_failure: runs after all jobs have finished running, including retries. Only runs if one of the jobs fails
SolidQueue::Batch.enqueue(
  # You can hand in a populated instance, to set options like wait times or a custom queue for the callback
  on_finish: BatchCompletionJob.new.set(wait_until: 1.hour.from_now, queue: :batch),
  # Otherwise just supply the job class
  on_success: BatchSuccessJob,
  on_failure: BatchFailureJob
) do
  5.times.map { |i| SleepyJob.perform_later(i) }
end

# Jobs that are part of a batch can enqueue more jobs into the batch
class EnqueueMoreJobsJob < ApplicationJob
  def perform
    batch.enqueue { YetAnotherJob.perform_later }
  end
end

Some have mentioned the GoodJob example for complex batches and asked if this could be implemented in the SolidQueue::Batch approach: https://github.com/bensheldon/good_job?tab=readme-ov-file#complex-batches. GoodJob offers mutable batches, and the SolidQueue::Batch implementation mostly does not. So this is how you would implement the same, more complex example:

class BatchWorkJob < ApplicationJob
  def perform(step)
    puts "BatchWorkJob: #{step}"
    if step == 'e'
      batch.enqueue { BatchWorkJob.perform_later('f') }
      puts "BatchWorkJob: enqueue f"
    end
  end
end

class BatchJob < ApplicationJob
  def perform(batch)
    metadata = batch.metadata || {}
    if metadata["stage"].nil?
      puts "BatchJob: initial stage"
      SolidQueue::Batch.enqueue(metadata: { stage: 1 }, on_finish: BatchJob) do
        BatchWorkJob.perform_later('a')
        BatchWorkJob.perform_later('b')
        BatchWorkJob.perform_later('c')
      end
    elsif metadata["stage"] == 1
      puts "BatchJob: stage 1"
      SolidQueue::Batch.enqueue(metadata: { stage: 2 }, on_finish: BatchJob) do
        BatchWorkJob.perform_later('d')
        BatchWorkJob.perform_later('e')
      end
    elsif metadata["stage"] == 2
      puts "BatchJob: stage 2"
      # ...
    end
  end
end

SolidQueue::Batch.enqueue(on_finish: BatchJob)

# BatchJob: initial stage
# BatchWorkJob: c
# BatchWorkJob: a
# BatchWorkJob: b
# BatchJob: stage 1
# BatchWorkJob: d
# BatchWorkJob: e
# BatchWorkJob: enqueue f
# BatchWorkJob: f
# BatchJob: stage 2

Here are the things that are open questions and missing implementation details:

  • Naming: is JobBatch the right name? General feedback on naming in the feature
  • Is it simple enough?
  • Do the callbacks make sense? on_success, on_finish, on_failure
  • We cannot handle discards right now. SolidQueue handles discard_on by marking the job as finished. That means the batch cannot identify that the job actually failed.
  • Hand a generalized interface into the job instead of an actual batch record?
  • Keeping things efficient when you have tons of jobs in a batch
  • How would/could batches fit into Mission Control - Jobs? Very basic batch support jpcamara/mission_control-jobs#1

@jpcamara jpcamara mentioned this pull request Feb 2, 2024
@mbajur
Copy link

mbajur commented Feb 2, 2024

Could that also support adding jobs to already existing batch? Like SleepyJob enqueuing another job that would also be added to the batch.

@jpcamara
Copy link
Contributor Author

jpcamara commented Feb 2, 2024

Could that also support adding jobs to already existing batch? Like SleepyJob enqueuing another job that would also be added to the batch.

@mbajur Definitely. As long as you're in a job that's part of the batch, adding another job to the batch would work fine. It'd be pretty simple to extend the existing code to handle that - something like this would solve your use-case I think?

class SleepyJob < ApplicationJob
  queue_as :background

  def perform(seconds_to_sleep)
    Rails.logger.info "Feeling #{seconds_to_sleep} seconds sleepy..."
    sleep seconds_to_sleep

    batch.enqueue { AnotherJob.perform_later }
  end
end

I can update the ActiveJob::JobBatchId to add an extra batch method to the job which returns the current batch based on the job batch_id, and add an instance level SolidQueue::JobBatch#enqueue method which let's you add more jobs to the batch.

This would only work safely inside of the job - if you were outside of the job, it's possible the batch would finish before the job gets created.

@mbajur
Copy link

mbajur commented Feb 2, 2024

Yes that would absolutely do the trick for me :) Thank you!

@jpcamara
Copy link
Contributor Author

jpcamara commented Mar 30, 2024

Hi @rosa 👋🏼 Congrats on getting SolidQueue past incubation and under the Rails umbrella officially!

I'm sure you've got alot on your plate! Are there any questions I can answer in regards to this PR? I can take the interface/functionality further, but I wanted to discuss it a bit before doing that. If there's anything additional you'd like me to tighten up/try out before discussing it, i'm happy to do so.

Also ok to just be on hold and not ready to discuss this further atm. Since it's been a couple months, I figured i'd check in.

@rosa
Copy link
Member

rosa commented Apr 12, 2024

Hey @jpcamara, so sorry for the delay and the silence here. I just haven't had the proper time to dedicate to this, and I think this requires and deserves more than a quick look. Thank you so much for putting this together!

My instinct with this kind of feature is that they require someone to use them before they're ready. From just looking at the code, I'm not quite sure what kind of edge cases and race conditions could arise here. This is how most of Solid Queue has been built: we've used it in HEY before making it "official" for everyone, seeing how it behaves under certain loads and what kind of problems we encounter. We caught and improved quite a few things that way.

We don't have a use case for batches right now, so I'm afraid I won't be able to take this feature to this "production-test" point on my side. Do you see yourself using this in a production setting?

@jpcamara
Copy link
Contributor Author

Hey @jpcamara, so sorry for the delay and the silence here. I just haven't had the proper time to dedicate to this, and I think this requires and deserves more than a quick look. Thank you so much for putting this together!

My instinct with this kind of feature is that they require someone to use them before they're ready. From just looking at the code, I'm not quite sure what kind of edge cases and race conditions could arise here. This is how most of Solid Queue has been built: we've used it in HEY before making it "official" for everyone, seeing how it behaves under certain loads and what kind of problems we encounter. We caught and improved quite a few things that way.

We don't have a use case for batches right now, so I'm afraid I won't be able to take this feature to this "production-test" point on my side. Do you see yourself using this in a production setting?

That makes sense! It was a bit of a chicken and an egg issue for me - I wanted to have batches in SolidQueue before starting to transition some things over, because I have code using Sidekiq Pro batches. But I can start experimenting with it now and report back. I'll continue to work on this PR as well in that case, too.

@dimroc
Copy link

dimroc commented Aug 3, 2024

Thanks for all the hard work here. I'm a big fan of batch jobs so I've been keeping my eye on this PR for a while. Sidekiq Pro and others support the notion of child batches. Like batch jobs, child batches let you work with a higher level abstraction that conceptually simplifies your background work. While implementing that in this PR would be feature creep, I wanted to raise awareness of it in hopes that we design with its extensibility in mind.

Thanks again @jpcamara

@jpcamara
Copy link
Contributor Author

jpcamara commented Aug 15, 2024

Thanks for all the hard work here. I'm a big fan of batch jobs so I've been keeping my eye on this PR for a while. Sidekiq Pro and others support the notion of child batches. Like batch jobs, child batches let you work with a higher level abstraction that conceptually simplifies your background work. While implementing that in this PR would be feature creep, I wanted to raise awareness of it in hopes that we design with its extensibility in mind.

Thanks again @jpcamara

hey @dimroc! I couldn't agree more. I used child batches in Sidekiq recently in a project and it highlighted the need to add them to this PR - it's an important feature. I've been putting alot of work into releasing a blog series on ruby concurrency and it's been eating up my free coding-related time, but i'm prioritizing getting back to this soon. Thanks for the feedback!

@jpcamara jpcamara force-pushed the batch-poc branch 2 times, most recently from c58f11d to 5ba1c27 Compare September 24, 2024 02:51
@jpcamara jpcamara changed the title Batch POC Batch Support Sep 25, 2024
@jpcamara
Copy link
Contributor Author

jpcamara commented Sep 26, 2024

Thanks for all the hard work here. I'm a big fan of batch jobs so I've been keeping my eye on this PR for a while. Sidekiq Pro and others support the notion of child batches. Like batch jobs, child batches let you work with a higher level abstraction that conceptually simplifies your background work. While implementing that in this PR would be feature creep, I wanted to raise awareness of it in hopes that we design with its extensibility in mind.
Thanks again @jpcamara

hey @dimroc! I couldn't agree more. I used child batches in Sidekiq recently in a project and it highlighted the need to add them to this PR - it's an important feature. I've been putting alot of work into releasing a blog series on ruby concurrency and it's been eating up my free coding-related time, but i'm prioritizing getting back to this soon. Thanks for the feedback!

I've added child batches to this PR @dimroc

@jpcamara
Copy link
Contributor Author

batch.enqueue { AnotherJob.perform_later }

Added support to this PR for enqueueing within a job using the syntax I suggested: batch.enqueue { AnotherJob.perform_later } @mbajur

@mariochavez
Copy link

@jpcamara just an idea here. Could this functionality become its own gem? Like an add-on for solid_queue?
I'm currently using Gush, which is similar to what this functionality does.

@nickpoorman
Copy link

nickpoorman commented Sep 27, 2024

One place I think this could be useful could be rolling back changes. Since the move to SQLite doesn't create jobs to a secondary database until after committing records to the main database, there is no atomic guarantee that both records will commit to both databases, leaving us to need manual rollback logic. Additionally, this reminds me that sidekiq had some atomic writing of batch jobs guarantee, either in pro or enterprise.

Same thing might happen if you make a change in your database and call an external API. With SQLite you no longer want to make those changes in your database and make the [long-running] external API call in a transaction. Let's say you have to call stripe twice and want to keep a record of what stage you're at. But something goes wrong with the second stripe call (say the charge is declined) and you need to rollback your database so it's not in an inconsistent state.

In this case you can queue the rollback as part of the batch.

If memory serves, I think the pattern for this sort of thing is called Sagas, or a DAG such as Elixir has with GenStage.

I also found this pattern extremely useful in the past because you can parallelize work more effectively. Say I need to make 1000 remote API calls. Each call should really be in its own job so that it can be retried if it hits the API limits and I need to perform some final job once all 1000 API calls have been made and the batch is complete.

@jpcamara
Copy link
Contributor Author

jpcamara commented Oct 1, 2024

@jpcamara just an idea here. Could this functionality become its own gem? Like an add-on for solid_queue? I'm currently using Gush, which is similar to what this functionality does.

@mariochavez it's true, it probably could be a gem! Sidekiq has batches as a pro feature, but there is also an open-source gem that mostly supports the same api https://github.com/breamware/sidekiq-batch.

There are some gotchas with approaching it that way (one, for instance, around jobs being automatically cleaned up and not being able to do anything but warn users against it). But my main motivation is that I personally want it as a first-class feature of SolidQueue. It's a first-class (albeit paid) feature of Sidekiq, and it's a first-class feature of GoodJob. Being built-in means it's more likely to get use/support and alleviates concerns it may be abandoned at some point. I also think it's a great core feature of a job library.

Gush is awesome! It definitely works similarly, though this being backed by a DB in SolidQueue means it has more ACID-type guarantees.

Something I would like to see is an even more sophisticated "workflow" type layer that worked with any activejob system, and that's something I've toyed around wtih over the past year. That kind of system I think goes a step beyond, is more complicated, and is better served as a separate gem. I think batch support being included in the job server is a good fit.

@dimroc
Copy link

dimroc commented Oct 1, 2024

Thanks for all this work @jpcamara!

I'm curious what the expected behavior is with limit_concurrency. Does it limit concurrency until the batch is complete (ie: only 3 batch jobs can run concurrently even when waiting on their individual jobs), which would surpass even Sidekiq's behavior, or does limit_concurrency only work with individual jobs?

An argument for keeping this in the main gem is to ease integration testing with the other features to increase cohesion and stability. I can see it getting hairy when you stack a few different configurations on top of nested batches, and having to assert proper behavior.

https://github.com/rails/solid_queue?tab=readme-ov-file#concurrency-controls

@abrunner94
Copy link

I would love to see batches in Solid Queue! My use case is primarily creating workflows.

@kaka-ruto
Copy link

Just wanna thank @jpcamara for the work he's done, is doing here. Would love to use this 🙏

@jpcamara
Copy link
Contributor Author

jpcamara commented Nov 2, 2024

Just wanna thank @jpcamara for the work he's done, is doing here. Would love to use this 🙏

Hey @kaka-ruto, I think I saw a message about you being willing to try this out? That would be great! Most of my free time is working on a RubyConf talk I have in a couple weeks, but i'll be shifting back to this right after that and will give you an update.

@jpcamara jpcamara force-pushed the batch-poc branch 2 times, most recently from 1ef7f38 to 8099030 Compare November 22, 2024 22:52
* Making it explicit is the easiest option, and the most in alignment with solid queue

* Fix errors around upserting across providers. SQLite and Postgres share identical syntax (at least for this use-case) and mysql works differently
* Reduce load from each callback, and makes checks less susceptible to race conditions

* Make sure monitor jobs can run, even absent of an ApplicationJob

* Allow setting the queue on the maintenance jobs

* Bring back emptyjob for empty queues
* We still track it, but it was causing alot of race conditions while trying to keep exclusively in callbacks. Running in a job or worker/dispatcher it works easily, but adds more overhead to the code and processing

* Move to explicit timestamp fields instead of status fields so it's easier to track specifics of batch transitions

* Move batches lower in the schema, after current models
* By querying batch executions remaining, the query times remain very fast.

* When we are constantly updating the single batch row counts, it becomes a hotspot. Fast executing jobs quickly accumulate and slow down overall job processing (processing a few thousand jobs goes for 10ish seconds to 40ish seconds). This still adds a bit of overhead, but significantly less (10ish seconds to 15ish seconds)

* Handle batch completion in an after_commit to make sure the transaction is visible before checking executions. This may mean we need to introduce some monitoring in the cases an after_commit fails to fire due network issues or a database issue
* Batch execution is managed through the BatchExecution model, which is dependent destroyed when jobs are destroyed

* Since it checks batch completion in an after_commit on: :destroy, it already gets checked, even when the job is not preserved

* Because we rely on batch executions and counts, we don't need the jobs to stick around to properly run a batch
* Without always updating it on the fly, it's always the same as total_jobs, or is 0. So it's not really useful as a distinct column
* Remove multi step job example since we don't handle hierarchy anymore for the time being
@jpcamara
Copy link
Contributor Author

Hey @rosa! Spurred on by jeremy smith, and in collaboration with some of @mhenrixon's work (and the lots of people asking when batches will be done or how it’s going), I spent the last few weeks cleaning up and tightening up the batch PR

I think at this point I’ve gotten it as close as possible to the spirit of the solid queue gem and philosophy (as I can see)

I know you wanted to see this feature get some production use before considering it for solid queue - and I’m going to talk with some folks about the best way to achieve that, as well as try to get some production use of it myself.

But even before that - is it possible to get some review on this and get a sense if you agree with the approach? I wouldn't want to get a lot of people trying out this feature and deploying it, only to find out you fundamentally don't agree with the direction.

It may seem a bit daunting in size, but more than half of it I think is README and tests - I think the core of the actual code is pretty understandable (but I'm bias, I've been staring at it for weeks).

I'm happy to get on a pairing call and answer questions, or just communicate async.

@ollym
Copy link

ollym commented Sep 17, 2025

@jpcamara we'd be willing to run this in production if you need a test.

Thank you for your hard work on this.

@trevorturk
Copy link

Congrats on getting this far, I've been following along and it's been fun to watch!

I had one question, which I think might be worth clarifying in the docs somewhere. I see in earlier comments that the batches are mutable, so I think that means we ought to be able to have "a multi-stage batch with both parallel and serial job steps" as GoodJob details here: https://github.com/bensheldon/good_job?tab=readme-ov-file#complex-batches?

@rosa
Copy link
Member

rosa commented Sep 17, 2025

@jpcamara thank you so much! I'll try to look into this over the weekend/next week. I'm not 100% sure of whether I'll have time to review everything by the end of next week, but I'll try.

@jpcamara
Copy link
Contributor Author

jpcamara commented Sep 17, 2025

Congrats on getting this far, I've been following along and it's been fun to watch!

I had one question, which I think might be worth clarifying in the docs somewhere. I see in earlier comments that the batches are mutable, so I think that means we ought to be able to have "a multi-stage batch with both parallel and serial job steps" as GoodJob details here: https://github.com/bensheldon/good_job?tab=readme-ov-file#complex-batches?

hey @trevorturk! I can definitely update to include a more complex example like good job's.

I landed on a record that while executing is somewhat mutable (you can enqueue more jobs), but is immutable once the batch is finished. This follows Sidekiq's batch implementation. I preferred the approach of an immutable set of batch callbacks/metadata - I think it is easier to reason about. But within batch jobs you can add more jobs to the batch, since the overall batch is still running. Once the callbacks are fired, the batch is "finished" so it can't have more jobs added to it (I believe the GoodJob internals will effectively reboot the batch if you add more jobs at that point - which would have made the internals complicated to follow). The first time I saw that GoodJob example I had to read it a few times to totally follow it (though in fairness, I only had experience with Sidekiq Batches at that point so I didn't expect it).

In the meantime, here is the same example as GoodJob, but using the Batch style I implemented. I think it's pretty much just as readable, and should do exactly what you're looking for:

class BatchWorkJob < ApplicationJob
  def perform(step)
    puts "BatchWorkJob: #{step}"
    if step == 'e'
      batch.enqueue { BatchWorkJob.perform_later('f') }
      puts "BatchWorkJob: enqueue f"
    end
  end
end

class BatchJob < ApplicationJob
  def perform(batch)
    metadata = batch.metadata || {}
    if metadata["stage"].nil?
      puts "BatchJob: initial stage"
      SolidQueue::Batch.enqueue(metadata: { stage: 1 }, on_finish: BatchJob) do
        BatchWorkJob.perform_later('a')
        BatchWorkJob.perform_later('b')
        BatchWorkJob.perform_later('c')
      end
    elsif metadata["stage"] == 1
      puts "BatchJob: stage 1"
      SolidQueue::Batch.enqueue(metadata: { stage: 2 }, on_finish: BatchJob) do
        BatchWorkJob.perform_later('d')
        BatchWorkJob.perform_later('e')
      end
    elsif metadata["stage"] == 2
      puts "BatchJob: stage 2"
      # ...
    end
  end
end

SolidQueue::Batch.enqueue(on_finish: BatchJob)

# BatchJob: initial stage
# BatchWorkJob: c
# BatchWorkJob: a
# BatchWorkJob: b
# BatchJob: stage 1
# BatchWorkJob: d
# BatchWorkJob: e
# BatchWorkJob: enqueue f
# BatchWorkJob: f
# BatchJob: stage 2

Tthere's probably some improvements we could make to accessing metadata always returning at least an empty hash, and maybe making it .with_indifferent_access.

@jpcamara
Copy link
Contributor Author

@jpcamara thank you so much! I'll try to look into this over the weekend/next week. I'm not 100% sure of whether I'll have time to review everything by the end of next week, but I'll try.

Thanks @rosa ! I'll keep an eye out for it 👀

@trevorturk
Copy link

I landed on a record that while executing is somewhat mutable (you can enqueue more jobs), but is immutable once the batch is finished. This follows Sidekiq's batch implementation. I preferred the approach of an immutable set of batch callbacks/metadata - I think it is easier to reason about.

@jpcamara this is looking really promising to me! I think I could use it for my batching needs as I was imagining with GoodJob. There's a little bit of a gotcha with the batch being closed when a theoretical initial job completes (in my theoretical use case) but I think it's easy enough to work around (add to the batch before letting the initial job complete) and I absolutely agree that keeping things easy to reason about (internally and externally) is of utmost importance.

I'm kicking off a new project where I'll be orchestrating LLM tools (like the claude-swarm, but with db records etc) and I wonder if you think it's a good time for someone like me to test things out IRL? (I could swap to GoodJob easily if things get too crazy, so it's nbd here.) I'd be testing out the Mission Control stuff as well, and happy to report back and/or help with implementation bits and pieces.

@jpcamara
Copy link
Contributor Author

jpcamara commented Sep 18, 2025

I landed on a record that while executing is somewhat mutable (you can enqueue more jobs), but is immutable once the batch is finished. This follows Sidekiq's batch implementation. I preferred the approach of an immutable set of batch callbacks/metadata - I think it is easier to reason about.

@jpcamara this is looking really promising to me! I think I could use it for my batching needs as I was imagining with GoodJob. There's a little bit of a gotcha with the batch being closed when a theoretical initial job completes (in my theoretical use case) but I think it's easy enough to work around (add to the batch before letting the initial job complete) and I absolutely agree that keeping things easy to reason about (internally and externally) is of utmost importance.

I'm kicking off a new project where I'll be orchestrating LLM tools (like the claude-swarm, but with db records etc) and I wonder if you think it's a good time for someone like me to test things out IRL? (I could swap to GoodJob easily if things get too crazy, so it's nbd here.) I'd be testing out the Mission Control stuff as well, and happy to report back and/or help with implementation bits and pieces.

I think this would be great! If you could give this a try that'd be a wonderful test. I've got pretty good test coverage, and i've been stress testing the code as well to try and get the batch code to have as low an overhead as possible - so I feel comfortable for you to start trying it out.

Rosa hasn't reviewed it yet, so it's possible/(likely?) some internals will change. But I think the actual interface should stay the same.

@trevorturk
Copy link

Ok, will test if/when I get there (I've been creating a project plan with Claude today and it's... a long list of todos lol 🫣)

@rosa
Copy link
Member

rosa commented Sep 22, 2025

Hey, just a quick update: I didn't manage to get to this over the weekend and this week I'm on-call, looking crazy busy, so most likely I won't get to do it either and will have to wait until next week 😭

t.string "concurrency_key"
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
t.string "batch_id"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should add a foreign key here. Originally I was only creating batches if jobs actually enqueued, but now I always create them, and run an empty batch if no jobs end up enqueued.

serialize :metadata, coder: JSON

after_initialize :set_batch_id
after_commit :start_batch, on: :create, unless: -> { ActiveRecord.respond_to?(:after_all_transactions_commit) }
Copy link
Contributor Author

@jpcamara jpcamara Sep 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple places that use after_commits (or just do things after all transactions have committed using ActiveRecord.after_all_transactions_commit), which means they are susceptible to intermitten errors causing them to never fire. Ideally I would update the concurrency maintenance task to also manage checking that batches actually initialize properly. But I didn't want to add anything like that until I get an overall ok about the PRs approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.