Skip to content

Commit deb4b61

Browse files
committed
Reducing LEFT JOIN to ANTI JOIN: A Planner Optimization for 'WHERE col IS NULL'
1 parent 9078b8e commit deb4b61

File tree

7 files changed

+269
-3
lines changed

7 files changed

+269
-3
lines changed

src/SUMMARY.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,9 +27,9 @@
2727
- [GOO:面向大规模连接问题的贪心连接顺序搜索算法](./cn/2026/05/goo-greedy-join-search.md)
2828
- [执行器批处理:面向批量的元组处理](./cn/2026/05/executor-batching.md)
2929
- [第 04 周](./cn/2026/04/README.md)
30-
- [PostgreSQL 查询规划器优化:自动 COUNT(*) 转换](./cn/2026/04/planner-count-optimization.md)
30+
- [PostgreSQL 查询优化器优化:自动 COUNT(*) 转换](./cn/2026/04/planner-count-optimization.md)
3131
- [第 03 周](./cn/2026/03/README.md)
3232
- [扩展统计信息导入/导出功能](./cn/2026/03/extended-statistics-import-functions.md)
3333
- [pg_plan_advice:查询计划控制](./cn/2026/03/pg-plan-advice.md)
3434
- [第 07 周](./cn/2026/07/README.md)
35-
- [将 LEFT JOIN 归约为 ANTI JOIN:针对 "WHERE col IS NULL" 的规划器优化](./cn/2026/07/anti-join-left-join-optimization.md)
35+
- [将 LEFT JOIN 归约为 ANTI JOIN:针对 "WHERE col IS NULL" 的优化器优化](./cn/2026/07/anti-join-left-join-optimization.md)

src/cn/2026/07/README.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# 第 07 周(2026)
2+
3+
2026 年第 07 周的 PostgreSQL 邮件列表讨论。
4+
5+
🇬🇧 [English Version](../../../en/2026/07/README.md)
6+
7+
## 文章
8+
9+
- [将 LEFT JOIN 归约为 ANTI JOIN:针对 "WHERE col IS NULL" 的优化器优化](./anti-join-left-join-optimization.md)
Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
# 将 LEFT JOIN 归约为 ANTI JOIN:针对 "WHERE col IS NULL" 的优化器优化
2+
3+
## 引言
4+
5+
2025 年 12 月底,Nicolas Adenis-Lamarre 在 pgsql-hackers 邮件列表中提出了一项优化器优化:当查询使用 `LEFT JOIN ... WHERE right_table.column IS NULL`,且该列在语义上非空(例如 NOT NULL 或主键)时,自动识别为**反连接(anti-join)**。这类查询的语义是“左侧有、右侧无匹配的行”,正是反连接所表达的含义。识别出该模式后,优化器可以选择显式的反连接计划(如 Hash Anti Join),而不是普通的左连接加过滤,往往能获得更好的执行效率。
6+
7+
讨论吸引了 Tom Lane、David Rowley、Tender Wang、Richard Guo 等人参与。补丁历经多版迭代,被提交到 CommitFest,并在代码审查中暴露出与嵌套外连接和继承相关的正确性问题。本文概述该优化的思路、实现方式以及当前状态。
8+
9+
## 为什么重要
10+
11+
很多开发者会这样写“在 A 中找在 B 中没有匹配的行”:
12+
13+
```sql
14+
SELECT a.*
15+
FROM a
16+
LEFT JOIN b ON a.id = b.a_id
17+
WHERE b.some_not_null_col IS NULL;
18+
```
19+
20+
LEFT JOIN 会使来自 `a` 的无匹配行在 `b` 的所有列上为 NULL。当 `b.some_not_null_col` 在表 `b` 上为 NOT NULL 时,用 `WHERE b.some_not_null_col IS NULL` 过滤,留下的就是这些无匹配行。语义上这就是**反连接**:“在 A 中且不存在在 B 中匹配的行”。
21+
22+
若优化器不识别该模式,可能按普通左连接加过滤实现;若识别,则可以使用显式的 **Hash Anti Join** 等,执行更高效,也有利于选择更好的连接顺序。这类优化是“非强制”的——熟练用户可以把查询改写成 `NOT EXISTS``NOT IN`(并注意 NULL 语义),但自动识别能惠及所有用户,同时保留原有 SQL 的可读性。
23+
24+
## 技术背景
25+
26+
PostgreSQL 已有在部分场景下归约外连接的逻辑,例如:
27+
28+
- **提交 904f6a593 与 e2debb643** 引入了优化器可用于此类归约的基础设施。
29+
-`reduce_outer_joins_pass2` 中,优化器已经会在“连接自身的条件对某些被更高层条件**强制为 NULL** 的变量是严格的”时,尝试将 `JOIN_LEFT` 归约为 `JOIN_ANTI`
30+
31+
该处原有注释提到,还存在其他识别反连接的方式——例如检查来自右侧的变量是否因**表约束**(NOT NULL 等)而必然非空。Nicolas 的提议与 Tender 的补丁实现的正是这一点:利用 NOT NULL 等信息,在 `WHERE rhs_col IS NULL` 能推出“无匹配”时,将 LEFT JOIN 归约为 ANTI JOIN。
32+
33+
## 补丁演进
34+
35+
### Nicolas 的初版补丁
36+
37+
Nicolas 提交的草稿补丁实现了:
38+
39+
- 在“left join b where x is null”且 `x` 为来自右侧(RTE)的非空变量时,识别该模式。
40+
- 有意采用“快速实现”以验证可行性。
41+
42+
他还列举了其他想法(去掉冗余 DISTINCT/GROUP BY、合并双重 ORDER BY、对 NOT IN 做反连接、以及“查看改写后查询”的方式等),邮件中略有讨论,但非本文重点。
43+
44+
### Tom Lane 与 David Rowley
45+
46+
**Tom Lane** 指出:
47+
48+
- 该优化合理,且应使用新基础设施(904f6a593、e2debb643)。
49+
- 草稿不应让周边注释过时;保持注释准确是必须的。
50+
51+
**David Rowley** 建议:
52+
53+
- 使用 `find_relation_notnullatts()`,并与 `forced_null_vars` 比较,注意 `FirstLowInvalidHeapAttributeNumber`
54+
- 在邮件列表中检索 **UniqueKeys** 相关历史(用于冗余 DISTINCT 消除)。
55+
- 对“消除双重 ORDER”和“NOT IN 反连接”持谨慎态度,二者此前都有讨论且边界情况复杂。
56+
- “查看改写后查询”含义不清,很多优化无法再表达成单一 SQL。
57+
58+
### Tender Wang 的实现(v2–v4)
59+
60+
Tender Wang 提供的补丁:
61+
62+
- 基于 904f6a593 和 e2debb643 的基础设施实现。
63+
- 更新了 `reduce_outer_joins_pass2` 中的注释,说明通过右侧 NOT NULL 约束识别反连接的新情况。
64+
- 增加了回归测试。
65+
66+
随后 Nicolas:
67+
68+
- 确认 Tender 的补丁是正确的(经重新测试)。
69+
- 建议增加提前退出:仅当 `forced_null_vars != NIL` 时才执行新逻辑,避免在大多数没有“强制为 NULL”变量的左连接上调用 `find_nonnullable_vars``have_var_is_notnull`
70+
- 贡献了额外回归测试,使用**新表**(带 NOT NULL 约束),而不是修改 tenk1 等现有测试表。
71+
72+
**Tom Lane** 明确:不应修改通用测试对象(如 `test_setup.sql` 中的表),否则可能改变规划行为并影响其他测试。新测试应使用新表或已有且属性合适的表。
73+
74+
Tender 将 Nicolas 的提前退出与回归测试合并为 **v4** 单补丁,并提交到 [CommitFest](https://commitfest.postgresql.org/patch/6375/)
75+
76+
## Richard Guo 的审查:正确性问题
77+
78+
Richard Guo 对 v4 的审查发现了两个正确性问题。
79+
80+
### 1. 嵌套外连接
81+
82+
当左连接的右侧本身又包含外连接时,即使某列在其基表上为 NOT NULL,在连接结果中仍可能为 NULL。此时将该外连接归约为反连接会出错。
83+
84+
例如(表 `t1``t2``t3`,列如 `(a NOT NULL, b, c)`):
85+
86+
```sql
87+
EXPLAIN (COSTS OFF)
88+
SELECT * FROM t1
89+
LEFT JOIN (t2 LEFT JOIN t3 ON t2.c = t3.c) ON t1.b = t2.b
90+
WHERE t3.a IS NULL;
91+
```
92+
93+
这里 `t3.a``t3` 上为 NOT NULL,但由于内层 `t2 LEFT JOIN t3`,来自 `t1` 的一行在与子查询连接后仍可能使 `t3.a` 为 NULL(当在 `t3` 中无匹配时)。因此上层连接必须保持为左连接;若错误地转为反连接,会错误地丢弃行。
94+
95+
补丁在判断“非空列”时没有考虑该变量是否会被下层外连接变为 NULL。Richard 指出当前 `forced_null_vars` 中并未记录 `varnullingrels`,一种简单修复是仅当右侧无外连接(`right_state->contains_outer` 为 false)时做此优化,但这会过于保守。
96+
97+
他提出的方向是:在 `reduce_outer_joins_pass1_state` 中记录每个子树下**可为空的基表 relid**;在检查 NOT NULL 约束时,跳过来自这些 rel 的变量。他附上了 **v5** 补丁以说明该思路。
98+
99+
### 2. 继承
100+
101+
对继承父表而言,某些子表可能在某列上有 NOT NULL,而其他子表没有。补丁未考虑这种情况;相比嵌套外连接,这一点相对容易修复。
102+
103+
## 其他讨论
104+
105+
- **Pavel Stehule** 提醒避免在列表中 top-posting;PostgreSQL 维基有[邮件列表风格](https://wiki.postgresql.org/wiki/Mailing_Lists)说明。
106+
- **子查询中的常量**:Nicolas 提到像 `SELECT * FROM a LEFT JOIN (SELECT 1 AS const1 FROM b) x WHERE x.const1 IS NULL` 这类情况未被处理,他认为不值得专门处理。
107+
108+
## 当前状态
109+
110+
- **v4** 补丁(含提前退出与回归测试)已提交至 CommitFest(patch 6375)。
111+
- **Richard Guo 的 v5** 通过记录可为空的基表 rel 并收紧 NOT NULL 的使用条件,针对嵌套外连接与继承问题做了修正。
112+
- 截至该讨论,工作仍在进行;最终是否合入以及以何种形式合入,以邮件列表与 CommitFest 为准。
113+
114+
## 小结
115+
116+
在能够证明右侧某列为非空的前提下,将 `LEFT JOIN ... WHERE rhs_not_null_col IS NULL` 自动归约为反连接,是一项有用的优化器优化,可在不要求用户改写 SQL 的情况下提升性能。补丁从草稿发展到基于现有基础设施的实现,并加入了回归测试与提前退出。审查反馈指出了重要的正确性约束:右侧可能存在嵌套外连接或继承,因此只有在变量不会被下层连接或继承变为 NULL 时,才能基于 NOT NULL 做归约。后续工作集中在 Richard 的方案(记录可为空的基表 rel、限制 NOT NULL 检查)以及对继承的安全处理上。
117+
118+
## 参考
119+
120+
- [讨论串:Planner : anti-join on left joins](https://www.postgresql.org/message-id/flat/CACPGbctKMDP50PpRH09in%2BoWbHtZdahWSroRstLPOoSDKwoFsw%40mail.gmail.com)
121+
- [CommitFest patch 6375](https://commitfest.postgresql.org/patch/6375/)
122+
- David Rowley 提到的:[UniqueKeys](https://www.postgresql.org/search/?m=1&q=UniqueKeys&l=1&d=-1&s=d)[NOT IN / 反连接](https://www.postgresql.org/message-id/flat/3793.1565689764%40linux-edt6#bf4b983d5744bca153c288904c038020)

src/cn/2026/README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,9 @@
1010
- [GOO:面向大规模连接问题的贪心连接顺序搜索算法](./05/goo-greedy-join-search.md)
1111
- [执行器批处理:面向批量的元组处理](./05/executor-batching.md)
1212
- [第 04 周](./04/index.html)
13-
- [PostgreSQL 查询规划器优化:自动 COUNT(*) 转换](./04/planner-count-optimization.md)
13+
- [PostgreSQL 查询优化器优化:自动 COUNT(*) 转换](./04/planner-count-optimization.md)
1414
- [第 03 周](./03/index.html)
1515
- [扩展统计信息导入/导出功能](./03/extended-statistics-import-functions.md)
1616
- [pg_plan_advice:查询计划控制](./03/pg-plan-advice.md)
17+
- [第 07 周](./07/README.md)
18+
- [将 LEFT JOIN 归约为 ANTI JOIN:针对 "WHERE col IS NULL" 的优化器优化](./07/anti-join-left-join-optimization.md)

src/en/2026/07/README.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# Week 07 (2026)
2+
3+
PostgreSQL mailing list discussions for Week 07, 2026.
4+
5+
🇨🇳 [中文版本](../../../cn/2026/07/README.md)
6+
7+
## Articles
8+
9+
- [Reducing LEFT JOIN to ANTI JOIN: A Planner Optimization for "WHERE col IS NULL"](./anti-join-left-join-optimization.md)
Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
# Reducing LEFT JOIN to ANTI JOIN: A Planner Optimization for "WHERE col IS NULL"
2+
3+
## Introduction
4+
5+
In late December 2025, Nicolas Adenis-Lamarre raised a planner optimization on the pgsql-hackers list: automatically detect **anti-join** patterns in queries that use `LEFT JOIN ... WHERE right_table.column IS NULL` when that column is known to be non-nullable (e.g. NOT NULL or primary key). Such queries mean "rows from the left side with no matching row on the right," which is exactly what an **anti-join** expresses. Recognizing this lets the planner choose a dedicated anti-join plan instead of a generic left join + filter, often with better performance.
6+
7+
The discussion drew in Tom Lane, David Rowley, Tender Wang, Richard Guo, and others. A patch evolved through several versions, was submitted to CommitFest, and received detailed review that uncovered correctness issues with nested outer joins and inheritance. This post summarizes the idea, the implementation approach, and the current status.
8+
9+
## Why This Matters
10+
11+
Many developers write "find rows in A with no match in B" as:
12+
13+
```sql
14+
SELECT a.*
15+
FROM a
16+
LEFT JOIN b ON a.id = b.a_id
17+
WHERE b.some_not_null_col IS NULL;
18+
```
19+
20+
Because the join is a LEFT JOIN, unmatched rows from `a` have NULL in all columns from `b`. Filtering on `b.some_not_null_col IS NULL` (when that column is NOT NULL in `b`) therefore keeps only those unmatched rows. Semantically this is an **anti-join**: "rows in A that do not have a matching row in B."
21+
22+
If the planner does not recognize this pattern, it may implement it as a normal left join plus a filter. If it *does* recognize it, it can use an explicit **Hash Anti Join** (or similar), which can be more efficient and can unlock better join order choices. The optimization is "non-mandatory" in the sense that a skilled user could rewrite the query as `NOT EXISTS` or `NOT IN` (with due care for NULLs), but automatic detection helps all users and keeps the original SQL readable.
23+
24+
## Technical Background
25+
26+
PostgreSQL already has logic to reduce outer joins in certain cases. In particular:
27+
28+
- **Commits 904f6a593 and e2debb643** added infrastructure that the planner can use for this kind of reduction.
29+
- In `reduce_outer_joins_pass2`, the planner already tries to reduce `JOIN_LEFT` to `JOIN_ANTI` when the join's own quals are strict for some var that was **forced null** by higher qual levels (e.g. by an upper `WHERE`).
30+
31+
The existing comment in that area noted that there are other ways to detect an anti-join—for example, checking whether vars from the right-hand side are non-null because of **table constraints** (NOT NULL, etc.). That was left for later; Nicolas's proposal and Tender's patch implement exactly that: use NOT NULL (and related) information to detect when `WHERE rhs_col IS NULL` implies "no match," and thus when a LEFT JOIN can be reduced to an ANTI JOIN.
32+
33+
## Patch Evolution
34+
35+
### Nicolas's initial patch
36+
37+
Nicolas sent a draft patch that:
38+
39+
- Detected the pattern "left join b where x is null" when `x` is a non-null var from the right-hand side (RTE).
40+
- Was intentionally quick-and-dirty to see if the change was feasible.
41+
42+
He also listed other ideas (removing redundant DISTINCT/GROUP BY, folding double ORDER BY, anti-join on `NOT IN`, and a way to "view the rewritten query"). Those were discussed briefly but are not the focus of this post.
43+
44+
### Tom Lane and David Rowley
45+
46+
**Tom Lane** replied that:
47+
48+
- The optimization is reasonable and the new infrastructure (904f6a593, e2debb643) should be used.
49+
- The draft should not leave nearby comments outdated; keeping comments accurate is mandatory.
50+
51+
**David Rowley** suggested:
52+
53+
- Using `find_relation_notnullatts()` and comparing with `forced_null_vars`, with care for `FirstLowInvalidHeapAttributeNumber`.
54+
- Searching the archives for prior work on **UniqueKeys** (for redundant DISTINCT removal).
55+
- Being cautious about "remove double order" and "NOT IN" anti-join; both have been discussed before and have subtle edge cases.
56+
- Noting that "view the rewritten query" is ambiguous—many optimizations cannot be expressed back as a single SQL statement.
57+
58+
### Tender Wang's implementation (v2–v4)
59+
60+
Tender Wang provided a patch that:
61+
62+
- Used the infrastructure from 904f6a593 and e2debb643.
63+
- Updated the comments in `reduce_outer_joins_pass2` to describe the new case (detecting anti-join via NOT NULL constraints on the RHS).
64+
- Added regression tests.
65+
66+
Nicolas then:
67+
68+
- Confirmed that Tender's patch was correct (after re-testing).
69+
- Suggested an early exit: only run the new logic when `forced_null_vars != NIL`, to avoid calling `find_nonnullable_vars` and `have_var_is_notnull` on every left join when most have no forced-null vars.
70+
- Contributed extra regression tests using **new tables** (with NOT NULL constraints) instead of modifying existing test tables like `tenk1`.
71+
72+
**Tom Lane** clarified that modifying common test objects (e.g. in `test_setup.sql`) is a bad idea, as it can change planner behavior and break or alter other tests. New tests should use new tables or existing ones that already match the needed properties.
73+
74+
Tender incorporated Nicolas's early-exit and regression tests into a single **v4** patch and submitted it to [CommitFest](https://commitfest.postgresql.org/patch/6375/).
75+
76+
## Richard Guo's review: correctness issues
77+
78+
Richard Guo reviewed the v4 patch and found two correctness problems.
79+
80+
### 1. Nested outer joins
81+
82+
When the right-hand side of the left join itself contains an outer join, a column that is NOT NULL in its base table can still become NULL in the join result. Reducing the outer join to an anti-join in that case is wrong.
83+
84+
Example (tables `t1`, `t2`, `t3` with columns e.g. `(a NOT NULL, b, c)`):
85+
86+
```sql
87+
EXPLAIN (COSTS OFF)
88+
SELECT * FROM t1
89+
LEFT JOIN (t2 LEFT JOIN t3 ON t2.c = t3.c) ON t1.b = t2.b
90+
WHERE t3.a IS NULL;
91+
```
92+
93+
Here `t3.a` is NOT NULL in `t3`, but because of the inner `t2 LEFT JOIN t3`, a row from `t1` can be joined to the subquery and still have `t3.a` NULL (when there is no matching row in `t3`). So the upper join must remain a left join; converting it to an anti-join would drop rows incorrectly.
94+
95+
The patch was treating any var from a NOT NULL column as "safe" for the anti-join reduction without considering whether that var could be nulled by a lower-level outer join. Richard noted that we don't currently record `varnullingrels` in `forced_null_vars`, so a simple fix would be to only do this optimization when the RHS has no outer joins (`right_state->contains_outer` false), but that would be too restrictive.
96+
97+
His proposed direction: in `reduce_outer_joins_pass1_state`, record the relids of base rels that are nullable within each subtree. Then, when checking NOT NULL constraints, skip vars that come from those rels. He attached a **v5** patch illustrating this idea.
98+
99+
### 2. Inheritance
100+
101+
For inheritance parent tables, some child tables might have a NOT NULL constraint on a column while others do not. The patch did not account for that; the second issue is more straightforward to fix than the nested-outer-join case.
102+
103+
## Other discussion points
104+
105+
- **Pavel Stehule** asked participants to avoid top-posting on the list; the PostgreSQL wiki has guidelines on [mailing list style](https://wiki.postgresql.org/wiki/Mailing_Lists).
106+
- **Constants from subqueries**: Nicolas noted that a case like `SELECT * FROM a LEFT JOIN (SELECT 1 AS const1 FROM b) x WHERE x.const1 IS NULL` is not handled; he considered it not worth handling.
107+
108+
## Current Status
109+
110+
- The **v4** patch (with early exit and regression tests) was submitted to CommitFest (patch 6375).
111+
- **Richard Guo's v5** patch addresses the nested-outer-join and inheritance issues by tracking nullable base rels and tightening when NOT NULL can be used for the reduction.
112+
- As of the thread, the discussion was ongoing; the final resolution (e.g. commit of a revised patch) would be tracked on the list and in CommitFest.
113+
114+
## Conclusion
115+
116+
Automatically reducing `LEFT JOIN ... WHERE rhs_not_null_col IS NULL` to an anti-join when the column is provably non-nullable is a useful planner optimization that can improve performance without requiring users to rewrite queries. The patch has evolved from a draft to an implementation using the existing planner infrastructure, with regression tests and an early-exit optimization. Reviewer feedback has identified important correctness constraints: the RHS may contain nested outer joins or inheritance, so NOT NULL must be applied only when the var cannot be nulled by lower joins or by inheritance. Follow-up work centers on Richard's approach (recording nullable base rels and restricting the NOT NULL check accordingly) and on handling inheritance safely.
117+
118+
## References
119+
120+
- [Discussion thread: Planner : anti-join on left joins](https://www.postgresql.org/message-id/flat/CACPGbctKMDP50PpRH09in%2BoWbHtZdahWSroRstLPOoSDKwoFsw%40mail.gmail.com)
121+
- [CommitFest patch 6375](https://commitfest.postgresql.org/patch/6375/)
122+
- David Rowley's references: [UniqueKeys](https://www.postgresql.org/search/?m=1&q=UniqueKeys&l=1&d=-1&s=d), [NOT IN / anti-join](https://www.postgresql.org/message-id/flat/3793.1565689764%40linux-edt6#bf4b983d5744bca153c288904c038020)

0 commit comments

Comments
 (0)