Skip to content

Conversation

@Kobzol
Copy link
Member

@Kobzol Kobzol commented Dec 5, 2025

Various cleanups that I did before splitting the current benchmark set into two.

@Jamesbarford
Copy link
Contributor

I'm not 100% sure that the dequeing logic is bulletproof, but with AS MATERIALIZED I'm a bit more confident in it :D

What is it about the dequeing logic that you don't think is bulletproof? I don't see how AS MATERIALIZED will affect this logic?

@Kobzol
Copy link
Member Author

Kobzol commented Dec 8, 2025

I recently saw a talk about using Postgres as a job queue (don't have the recording on me right now, will have to rewatch it once it's available) that was mentioning some issues with FOR LOCKED not being applied as expected in subqueries. One specific thing that could happen (in theory) is that Postgres will run the subquery multiple times (https://www.shayon.dev/post/2025/119/a-postgresql-planner-gotcha-with-ctes-delete-and-limit/). I don't think that it should happen in this case, but also I don't understand the Postgres execution model enough to be 100% sure. With MATERIALIZED that should not happen, AFAIK.

@Jamesbarford
Copy link
Contributor

I recently saw a talk about using Postgres as a job queue (don't have the recording on me right now, will have to rewatch it once it's available) that was mentioning some issues with FOR LOCKED not being applied as expected in subqueries. One specific thing that could happen (in theory) is that Postgres will run the subquery multiple times (https://www.shayon.dev/post/2025/119/a-postgresql-planner-gotcha-with-ctes-delete-and-limit/). I don't think that it should happen in this case, but also I don't understand the Postgres execution model enough to be 100% sure. With MATERIALIZED that should not happen, AFAIK.

Interesting blog post! We don't run into the same issues as them however. Running EXPLAIN ANALYSE with and without MATERIALIZED shows exactly the same query plan (see results below). From the blog post their issue was loops=5, our loops=1. So we can safely remove it. As far as I can see there will be no issues with this query, either in terms of speed or contention.

Query as it is
Hash Join  (cost=28.54..36.49 rows=1 width=135) (actual time=0.307..0.408 rows=1 loops=1)
  Hash Cond: (br.tag = updated.request_tag)
  CTE picked
    ->  Limit  (cost=20.27..20.28 rows=1 width=63) (actual time=0.109..0.112 rows=1 loops=1)
          ->  LockRows  (cost=20.27..21.36 rows=87 width=63) (actual time=0.108..0.109 rows=1 loops=1)
                ->  Sort  (cost=20.27..20.49 rows=87 width=63) (actual time=0.082..0.083 rows=1 loops=1)
                      Sort Key: (CASE WHEN (job_queue.status = 'in_progress'::text) THEN 0 WHEN (job_queue.status = 'queued'::text) THEN 1 ELSE 2 END), job_queue.request_tag, job_queue.created_at
                      Sort Method: quicksort  Memory: 25kB
                      ->  Seq Scan on job_queue  (cost=0.00..19.83 rows=87 width=63) (actual time=0.022..0.030 rows=6 loops=1)
                            Filter: ((target = 'x86_64-unknown-linux-gnu'::text) AND (benchmark_set = 0) AND ((status = 'queued'::text) OR (status = 'in_progress'::text)))
  CTE updated
    ->  Update on job_queue job_queue_1  (cost=0.15..8.22 rows=1 width=110) (actual time=0.231..0.235 rows=1 loops=1)
          ->  Nested Loop  (cost=0.15..8.22 rows=1 width=110) (actual time=0.159..0.162 rows=1 loops=1)
                ->  CTE Scan on picked  (cost=0.00..0.02 rows=1 width=32) (actual time=0.119..0.120 rows=1 loops=1)
                ->  Index Scan using job_queue_pkey on job_queue job_queue_1  (cost=0.15..8.17 rows=1 width=14) (actual time=0.034..0.034 rows=1 loops=1)
                      Index Cond: (id = picked.id)
  ->  Seq Scan on benchmark_request br  (cost=0.00..7.14 rows=214 width=53) (actual time=0.016..0.057 rows=198 loops=1)
  ->  Hash  (cost=0.02..0.02 rows=1 width=120) (actual time=0.266..0.266 rows=1 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 9kB
        ->  CTE Scan on updated  (cost=0.00..0.02 rows=1 width=120) (actual time=0.242..0.246 rows=1 loops=1)
Planning Time: 0.655 ms
Execution Time: 0.663 ms
Query with MATERIALIZED
Hash Join  (cost=28.54..36.49 rows=1 width=135) (actual time=0.300..0.404 rows=1 loops=1)
  Hash Cond: (br.tag = updated.request_tag)
  CTE picked
    ->  Limit  (cost=20.27..20.28 rows=1 width=63) (actual time=0.107..0.109 rows=1 loops=1)
          ->  LockRows  (cost=20.27..21.36 rows=87 width=63) (actual time=0.105..0.106 rows=1 loops=1)
                ->  Sort  (cost=20.27..20.49 rows=87 width=63) (actual time=0.081..0.083 rows=1 loops=1)
                      Sort Key: (CASE WHEN (job_queue.status = 'in_progress'::text) THEN 0 WHEN (job_queue.status = 'queued'::text) THEN 1 ELSE 2 END), job_queue.request_tag, job_queue.created_at
                      Sort Method: quicksort  Memory: 25kB
                      ->  Seq Scan on job_queue  (cost=0.00..19.83 rows=87 width=63) (actual time=0.022..0.030 rows=6 loops=1)
                            Filter: ((target = 'x86_64-unknown-linux-gnu'::text) AND (benchmark_set = 0) AND ((status = 'queued'::text) OR (status = 'in_progress'::text)))
  CTE updated
    ->  Update on job_queue job_queue_1  (cost=0.15..8.22 rows=1 width=110) (actual time=0.227..0.231 rows=1 loops=1)
          ->  Nested Loop  (cost=0.15..8.22 rows=1 width=110) (actual time=0.156..0.159 rows=1 loops=1)
                ->  CTE Scan on picked  (cost=0.00..0.02 rows=1 width=32) (actual time=0.115..0.117 rows=1 loops=1)
                ->  Index Scan using job_queue_pkey on job_queue job_queue_1  (cost=0.15..8.17 rows=1 width=14) (actual time=0.034..0.034 rows=1 loops=1)
                      Index Cond: (id = picked.id)
  ->  Seq Scan on benchmark_request br  (cost=0.00..7.14 rows=214 width=53) (actual time=0.016..0.059 rows=198 loops=1)
  ->  Hash  (cost=0.02..0.02 rows=1 width=120) (actual time=0.259..0.263 rows=1 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 9kB
        ->  CTE Scan on updated  (cost=0.00..0.02 rows=1 width=120) (actual time=0.235..0.239 rows=1 loops=1)
Planning Time: 0.665 ms
Execution Time: 0.655 ms
The full query I was running
EXPLAIN ANALYSE WITH picked AS ( <-- Add `MATERIALIZED` here
    SELECT
        id
    FROM
        job_queue
    WHERE
        -- Take queued or in-progress jobs
        (status = 'queued' OR status = 'in_progress')
        AND target = 'x86_64-unknown-linux-gnu'
        AND benchmark_set = 0
    ORDER BY
        -- Prefer in-progress jobs that have not been finished previously, so that
        -- we can finish them.
        CASE
            WHEN status = 'in_progress' THEN 0
            WHEN status = 'queued' THEN 1
            ELSE 2
        END,
        request_tag,
        created_at
    LIMIT 1
    FOR UPDATE SKIP LOCKED
), updated AS (
    UPDATE
        job_queue
    SET
        collector_name = 'x64_set_0',
        started_at = NOW(),
        status = 'in_progress',
        retry = retry + 1
    FROM
        picked
    WHERE
        job_queue.id = picked.id
    RETURNING job_queue.*
)
SELECT
    updated.id,
    updated.backend,
    updated.profile,
    updated.request_tag,
    updated.created_at,
    updated.started_at,
    updated.retry,
    br.commit_type,
    br.commit_date
FROM updated
JOIN benchmark_request as br ON br.tag = updated.request_tag;

@Kobzol
Copy link
Member Author

Kobzol commented Dec 8, 2025

Yeah, to be clear, I don't think that we suffer from this issue right now. And I know that it produces the same query plan. I still want to have MATERIALIZED there to ensure that this will also be the case in the future (e.g. with new Postgres versions) :)

@Jamesbarford
Copy link
Contributor

Yeah, to be clear, I don't think that we suffer from this issue right now. And I know that it produces the same query plan. I still want to have MATERIALIZED there to ensure that this will also be the case in the future (e.g. with new Postgres versions) :)

I think we should remove it because it's more confusing than helpful. It looks like it's solving a problem, but it actually isn't; and the blog post it references addresses a completely different issue that we don't face. If we try to pre-empt every hypothetical future problem, we'll end up adding unnecessary complexity

@Kobzol
Copy link
Member Author

Kobzol commented Dec 8, 2025

The problem that it is solving is that it makes it clear to the reader (e.g. me) that the subquery will only run once, I think that is actually useful.

@Jamesbarford
Copy link
Contributor

I've taken a took a closer look both at the blog post above and the Stack Overflow reference in the code comment to see how they relate to our situation.

The blog post (https://www.shayon.dev/post/2025/119/a-postgresql-planner-gotcha-with-ctes-delete-and-limit/) is interesting, but its example focuses on nested queries with a DELETE ... LIMIT pattern, which is quite different from what our code is doing. Because of that, it's a bit hard to map those behaviours directly onto our use case.

The Stack Overflow link is also somewhat hard to apply to our scenario. In that example, the query plan shows loops=10 under load, and adding MATERIALIZED appears to stabilise the behaviour. The accepted answer mentions that the underlying mechanism isn't fully verified and may even be version-specific:

"I have not fully verified the exact mechanism to this day. Also, this may have been fixed in recent versions. Didn't rerun tests, yet."

We also don't have the table schema from that example, so it's difficult to tell whether MATERIALIZED was truly required or just a workaround for that specific setup.

Given the differences, anyone trying to understand why MATERIALIZED appears in our CTE might reasonably conclude it's redundant or tied to edge cases we don't have. The examples don't really map to our query’s structure, so the intent behind using it here remains unclear.

I'm absolutely on board with strengthening the robustness of the code; I just want to make sure we're not adding complexity without a clear, demonstrated need. Once we clarify the reasoning around this, the rest of the changes look great. 👍

@Kobzol Kobzol force-pushed the benchmark-set-robust branch from ee12af2 to c62b408 Compare December 8, 2025 09:43
@Kobzol
Copy link
Member Author

Kobzol commented Dec 8, 2025

Ok, fair enough, I trust your judgement on the matter :) I removed the commit.

Copy link
Contributor

@Jamesbarford Jamesbarford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@Kobzol Kobzol enabled auto-merge December 8, 2025 09:51
@Kobzol Kobzol added this pull request to the merge queue Dec 8, 2025
Merged via the queue into rust-lang:master with commit 0f81ee0 Dec 8, 2025
14 checks passed
@Kobzol Kobzol deleted the benchmark-set-robust branch December 8, 2025 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants