-
Notifications
You must be signed in to change notification settings - Fork 164
Make benchmark set and job operations a bit more robust #2340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
73ea94e to
ee12af2
Compare
What is it about the dequeing logic that you don't think is bulletproof? I don't see how |
|
I recently saw a talk about using Postgres as a job queue (don't have the recording on me right now, will have to rewatch it once it's available) that was mentioning some issues with |
Interesting blog post! We don't run into the same issues as them however. Running Query as it isQuery with MATERIALIZEDThe full query I was runningEXPLAIN ANALYSE WITH picked AS ( <-- Add `MATERIALIZED` here
SELECT
id
FROM
job_queue
WHERE
-- Take queued or in-progress jobs
(status = 'queued' OR status = 'in_progress')
AND target = 'x86_64-unknown-linux-gnu'
AND benchmark_set = 0
ORDER BY
-- Prefer in-progress jobs that have not been finished previously, so that
-- we can finish them.
CASE
WHEN status = 'in_progress' THEN 0
WHEN status = 'queued' THEN 1
ELSE 2
END,
request_tag,
created_at
LIMIT 1
FOR UPDATE SKIP LOCKED
), updated AS (
UPDATE
job_queue
SET
collector_name = 'x64_set_0',
started_at = NOW(),
status = 'in_progress',
retry = retry + 1
FROM
picked
WHERE
job_queue.id = picked.id
RETURNING job_queue.*
)
SELECT
updated.id,
updated.backend,
updated.profile,
updated.request_tag,
updated.created_at,
updated.started_at,
updated.retry,
br.commit_type,
br.commit_date
FROM updated
JOIN benchmark_request as br ON br.tag = updated.request_tag; |
|
Yeah, to be clear, I don't think that we suffer from this issue right now. And I know that it produces the same query plan. I still want to have |
I think we should remove it because it's more confusing than helpful. It looks like it's solving a problem, but it actually isn't; and the blog post it references addresses a completely different issue that we don't face. If we try to pre-empt every hypothetical future problem, we'll end up adding unnecessary complexity |
|
The problem that it is solving is that it makes it clear to the reader (e.g. me) that the subquery will only run once, I think that is actually useful. |
|
I've taken a took a closer look both at the blog post above and the Stack Overflow reference in the code comment to see how they relate to our situation. The blog post (https://www.shayon.dev/post/2025/119/a-postgresql-planner-gotcha-with-ctes-delete-and-limit/) is interesting, but its example focuses on nested queries with a The Stack Overflow link is also somewhat hard to apply to our scenario. In that example, the query plan shows
We also don't have the table schema from that example, so it's difficult to tell whether Given the differences, anyone trying to understand why I'm absolutely on board with strengthening the robustness of the code; I just want to make sure we're not adding complexity without a clear, demonstrated need. Once we clarify the reasoning around this, the rest of the changes look great. 👍 |
Avoid creating two separate functions that could diverge.
ee12af2 to
c62b408
Compare
|
Ok, fair enough, I trust your judgement on the matter :) I removed the commit. |
Jamesbarford
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Various cleanups that I did before splitting the current benchmark set into two.