-
Notifications
You must be signed in to change notification settings - Fork 30
[do not merge] Merge queue implementation #318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The core merge operation looks to be working first try 😎 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for preparing this! Left a few comments, mostly just thinking out loud.
src/database/operations.rs
Outdated
AND pr.mergeable_state = 'mergeable' | ||
-- Tree closure check (if tree_priority is set) | ||
AND ($2::int IS NULL OR pr.priority >= $2) | ||
ORDER BY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have a consistent queue order on the website and in the actual merge queue, and writing this logic in SQL is a bit complicated. I would thus implement the queue ordering in Rust, and in this query download all potentially eligible PRs (so those that are approved) without presorting them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've extracted the sort logic to Rust and is also applied to the queue site.
@@ -342,6 +343,40 @@ impl GithubRepositoryClient { | |||
Ok(prs) | |||
} | |||
|
|||
/// Create a commit status for the given SHA. | |||
pub async fn create_commit_status( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this for? Is this the homu status that gets attached to PRs? I always wondered what that is, does homu do it explicitly? 😆 I'm not sure if it's super useful, but I guess that there is some value in showing in the status that an auto build is happening. We could actually do the same for try builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok(true) | ||
} | ||
MergeResult::Conflict => { | ||
repo.client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to manually set the state of the PR in DB as "having conflicts" when this happens.
src/bors/merge_queue.rs
Outdated
|
||
// 3. Record the build in the database | ||
ctx.db | ||
.attach_try_build( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably work with auto builds, not try builds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
I knew something was wrong but somehow skipped this.
Conflict, | ||
} | ||
|
||
pub async fn handle_merge_queue(ctx: Arc<BorsContext>) -> anyhow::Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I originally envisioned is that basically the merge queue is a background process that checks the status every 30 seconds to decide what to do. Here it looks more like you forcefully trigger the merge queue after some events (comments being sent?). I think that if it runs in background on a timer, it will be more robust. We can still ping it after important events that can trigger a new auto build to be started (so mostly approval), but that should be just an optimization, not needed for correctness. Also it wouldn't even be needed I think, the chance of actually getting to do a new merge is pretty low, in most case there will already be a build running, so the queue will just check what is happening right now and then immediately exit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran it each time we parsed a command as that typically indicates some state has changed meaning the highest priority PR has changed.
But I guess just invoking the queue every 30s will do the job.
I've now changed it such that we do not invoke every time we parse a command and instead have added it as a refresh event. Though, I have kept the merge queue invocations after competing the auto build (complete_auto_build
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, there are just a few things that should essentially trigger the queue:
- PR being approved (can cause a new auto job if nothing is currently pending)
- Auto job failed/succeeded (can again cause a new auto job to be created)
The merge queue could read events from the queue, and if no event happens e.g. in 30s or 1m, it should still check PRs anyway.
Btw, I think you mentioned somewhere that the actual merge to master will happen in the webhook. In theory this could work, because there shouldn't ever be parallel auto builds completing, but I would still rather do the merge by the merge queue itself, to ensure that it all happens sequentially. So the webhook should just generate the event for the merge queue that an auto build has finished.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These events now trigger the queue:
- PR approval causes merge queue invocation
- Auto job failed/succeeded causes merge queue invocation
I've moved the base branch "merge" logic - it's more a force push than it is a merge tbh - to the merge queue itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well it should be a push, definitely not a force push 😆
@@ -998,32 +998,6 @@ pub(crate) async fn get_merge_queue_pull_requests( | |||
AND pr.mergeable_state = 'mergeable' | |||
-- Tree closure check (if tree_priority is set) | |||
AND ($2::int IS NULL OR pr.priority >= $2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Later we might also add a check here that tests if the auto build status isn't a failure. It's fine to check in Rust, but ideally we should pre-filter unbounded data in SQL. And I think that in some cases the number of PRs that are approved, but their CI, can slowly grow over time, when CI fails, the PR is forgotten, but it's still approved.
We should also clear old auto build results on PR push (and @bors retry
, once we have it), otherwise an old CI auto build result could prevent a PR from being merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Updated the SQL query to exclude auto build failures.
- Auto build results are now cleared on PR push
We do not really need this anymore since we invoce on refresh
It does not make sense to invoke after unapproval
It looks like you got a lot of stuff working here, which is great! I would suggest splitting it into smaller PRs +/- like this:
Does something like that make sense to you? Feel free to split it in a different way, how will it suit you. I'd just like to have a bunch of smaller PRs rather than reviewing all of this at once. Btw, what does "Promote try builds to auto builds" mean? |
Yeah, that looks about right. I've now got a fairly complete understanding of how the merge queue works and will close this PR once all the merge queue features are completed in full - I might still add commits here maybe just to try things out.
In Homu, if a try build has already succeeded and the PR gets approved, then it will skip directly to attempting to merge, so effectively promoting that try to an auto build. However, I've just tried to find an example of this feature in action and I can't, so I don't think this is worth implementing. |
Oh, that's more of a horrible bug that sometimes plagues us with homu than a feature 😆 Try builds usually run like a single CI job (out of ~80 ones that run on auto builds), so it's in no way a thing that can be promoted to an auto build :D |
This PR is not intended to be merged and will eventually be split into multiple (easier to review) PRs.
I plan to create the merge queue here in full so that I can make sure everything actually works together and can see it as a whole, as it's harder to do so in multiple separate PRs and should be much easier to review.
How it works:
The merge queue works similar to how Homu's queue works - it's not an actual queue but instead a stateless selection system.
Homu runs the queue via a call to
g.queue_handler()
and similarly, we callmerge_queue_tx.send(())
.For Homu, every time the queue handler is called, it sorts the PRs stored in-memory (≈750), whilst bors will instead query the database for all the eligible PRs and sort them accordingly.
Bors will also utilise coalescing, as the queue invocation should be called whenever the state changes, and if there are multiple invocations sent in close succession, then the latest invocation should capture the changes that the older invocations were sent for.
To Do:
try_complete_build
) and send message on check suite completed webhookPromote try builds to auto builds