Skip to content

[do not merge] Merge queue implementation #318

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

Sakib25800
Copy link
Contributor

@Sakib25800 Sakib25800 commented Jun 19, 2025

This PR is not intended to be merged and will eventually be split into multiple (easier to review) PRs.

I plan to create the merge queue here in full so that I can make sure everything actually works together and can see it as a whole, as it's harder to do so in multiple separate PRs and should be much easier to review.

How it works:
The merge queue works similar to how Homu's queue works - it's not an actual queue but instead a stateless selection system.

Homu runs the queue via a call to g.queue_handler() and similarly, we call merge_queue_tx.send(()).

For Homu, every time the queue handler is called, it sorts the PRs stored in-memory (≈750), whilst bors will instead query the database for all the eligible PRs and sort them accordingly.

Bors will also utilise coalescing, as the queue invocation should be called whenever the state changes, and if there are multiple invocations sent in close succession, then the latest invocation should capture the changes that the older invocations were sent for.

To Do:

  • Update auto build status (similar to the try_complete_build) and send message on check suite completed webhook
  • Cancel active auto builds on push to PR webhook
  • Promote try builds to auto builds

@Sakib25800
Copy link
Contributor Author

The core merge operation looks to be working first try 😎

Sakib25800/rust#5

Copy link
Contributor

@Kobzol Kobzol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for preparing this! Left a few comments, mostly just thinking out loud.

AND pr.mergeable_state = 'mergeable'
-- Tree closure check (if tree_priority is set)
AND ($2::int IS NULL OR pr.priority >= $2)
ORDER BY
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a consistent queue order on the website and in the actual merge queue, and writing this logic in SQL is a bit complicated. I would thus implement the queue ordering in Rust, and in this query download all potentially eligible PRs (so those that are approved) without presorting them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've extracted the sort logic to Rust and is also applied to the queue site.

@@ -342,6 +343,40 @@ impl GithubRepositoryClient {
Ok(prs)
}

/// Create a commit status for the given SHA.
pub async fn create_commit_status(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for? Is this the homu status that gets attached to PRs? I always wondered what that is, does homu do it explicitly? 😆 I'm not sure if it's super useful, but I guess that there is some value in showing in the status that an auto build is happening. We could actually do the same for try builds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to know I wasn't the only one who didn't know what this did in Homu's code.

It sets the yellow dot (aka StatusState::Pending) and the description next to it.

And yeah, it's missing for the try builds - I'll add it in the future.

Screenshot 2025-06-20 at 12 11 48

Ok(true)
}
MergeResult::Conflict => {
repo.client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to manually set the state of the PR in DB as "having conflicts" when this happens.


// 3. Record the build in the database
ctx.db
.attach_try_build(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably work with auto builds, not try builds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

I knew something was wrong but somehow skipped this.

Conflict,
}

pub async fn handle_merge_queue(ctx: Arc<BorsContext>) -> anyhow::Result<()> {
Copy link
Contributor

@Kobzol Kobzol Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I originally envisioned is that basically the merge queue is a background process that checks the status every 30 seconds to decide what to do. Here it looks more like you forcefully trigger the merge queue after some events (comments being sent?). I think that if it runs in background on a timer, it will be more robust. We can still ping it after important events that can trigger a new auto build to be started (so mostly approval), but that should be just an optimization, not needed for correctness. Also it wouldn't even be needed I think, the chance of actually getting to do a new merge is pretty low, in most case there will already be a build running, so the queue will just check what is happening right now and then immediately exit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran it each time we parsed a command as that typically indicates some state has changed meaning the highest priority PR has changed.

But I guess just invoking the queue every 30s will do the job.

I've now changed it such that we do not invoke every time we parse a command and instead have added it as a refresh event. Though, I have kept the merge queue invocations after competing the auto build (complete_auto_build).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there are just a few things that should essentially trigger the queue:

  • PR being approved (can cause a new auto job if nothing is currently pending)
  • Auto job failed/succeeded (can again cause a new auto job to be created)

The merge queue could read events from the queue, and if no event happens e.g. in 30s or 1m, it should still check PRs anyway.

Btw, I think you mentioned somewhere that the actual merge to master will happen in the webhook. In theory this could work, because there shouldn't ever be parallel auto builds completing, but I would still rather do the merge by the merge queue itself, to ensure that it all happens sequentially. So the webhook should just generate the event for the merge queue that an auto build has finished.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These events now trigger the queue:

  • PR approval causes merge queue invocation
  • Auto job failed/succeeded causes merge queue invocation

I've moved the base branch "merge" logic - it's more a force push than it is a merge tbh - to the merge queue itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it should be a push, definitely not a force push 😆

@@ -998,32 +998,6 @@ pub(crate) async fn get_merge_queue_pull_requests(
AND pr.mergeable_state = 'mergeable'
-- Tree closure check (if tree_priority is set)
AND ($2::int IS NULL OR pr.priority >= $2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Later we might also add a check here that tests if the auto build status isn't a failure. It's fine to check in Rust, but ideally we should pre-filter unbounded data in SQL. And I think that in some cases the number of PRs that are approved, but their CI, can slowly grow over time, when CI fails, the PR is forgotten, but it's still approved.

We should also clear old auto build results on PR push (and @bors retry, once we have it), otherwise an old CI auto build result could prevent a PR from being merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Updated the SQL query to exclude auto build failures.
  • Auto build results are now cleared on PR push

@Kobzol
Copy link
Contributor

Kobzol commented Jun 20, 2025

It looks like you got a lot of stuff working here, which is great! I would suggest splitting it into smaller PRs +/- like this:

  1. Add auto builds to PRs
  2. Clear attached auto build on push to PR (btw I just realized that when we clear auto build results, and once a new try build is started, we should probably also remove the corresponding rows in the Build table, so that they don't accumulate over time - not sure if we do that now? but that's a separate issue)
  3. Add the merge queue background process, which will only run every N seconds, figure out which PR is at the top of the queue and start an auto build. This will also include setting the PR labels
  4. Trigger the merge queue logic on approvals and auto build failures
  5. Trigger the merge queue logic on CI success and implement the actual merge to master

Does something like that make sense to you? Feel free to split it in a different way, how will it suit you. I'd just like to have a bunch of smaller PRs rather than reviewing all of this at once.

Btw, what does "Promote try builds to auto builds" mean?

@Sakib25800
Copy link
Contributor Author

Yeah, that looks about right.

I've now got a fairly complete understanding of how the merge queue works and will close this PR once all the merge queue features are completed in full - I might still add commits here maybe just to try things out.

  1. I'll delete the corresponding rows in the Build table. I actually hadn't added it to this PR because I thought maybe it could be useful build history, but I'm sure you could obtain the same information that BuildModel contains via the comments posted by bors.
  2. I'll maybe add to the docs, similar to what we have for the try builds, about how the merging works.

In Homu, if a try build has already succeeded and the PR gets approved, then it will skip directly to attempting to merge, so effectively promoting that try to an auto build. However, I've just tried to find an example of this feature in action and I can't, so I don't think this is worth implementing.

@Kobzol
Copy link
Contributor

Kobzol commented Jun 20, 2025

Oh, that's more of a horrible bug that sometimes plagues us with homu than a feature 😆

Try builds usually run like a single CI job (out of ~80 ones that run on auto builds), so it's in no way a thing that can be promoted to an auto build :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants