fix(optimizer)!: simplify AND/OR with static BOOLEAN type #6152

geooo109 · 2025-10-20T18:55:37Z

We should avoid simplifying cases like SELECT 1 AND x or SELECT 0 or x.
So, this PR makes this check tighter by checking if the LHS or RHS is Boolean type.

georgesittas · 2025-10-21T13:18:41Z

Discussed offline– we should check whether all engines evaluate connectors into boolean values, instead of treating them as control flow. If SELECT <always true value> AND other is always boolean, we can't just simplify it to SELECT other, because the semantics change. Unfortunately, this is what we do today, which is incorrect.

georgesittas · 2025-10-23T09:50:33Z

We discovered that post-simplifying we lose some types. That is a bug and we decided to address it in this PR instead of postponing, because that will reduce the PR's complexity as a side-effect.

sqlglot/optimizer/simplify.py

georgesittas · 2025-10-23T12:23:25Z

@barakalon @tobymao curious what your thoughts are on this.

The problem: today we infer types & then run other rules that transform the AST, resulting in building new, untyped, expressions. This is a bug on its own, and it also affects type-sensitive rules. E.g., simplify now checks the boolean type to apply certain transformations but inbetween types are lost and so it cannot always work.

My worry is that fixing each call site introduces a new pattern where we have to be alert for rules after annotate_types and ensure new AST nodes are always typed. OTOH, not sure how else we can unblock these rules in simplify that run up to a fix point...

tests/fixtures/optimizer/simplify.sql

geooo109 · 2025-10-30T11:41:28Z

/benchmark

github-actions · 2025-10-30T11:53:29Z

Benchmark Results

Legend:

🟢🟢 = 2x+ faster
🟢 = 1.1x - 2x faster
⚪ = No significant change (< 1.1x)
🔴 = 1.1x - 2x slower
🔴🔴 = 2x+ slower

Parsing Benchmark

Benchmark	bench_parse_main	bench_parse_pr
parse_sqlglot_tpch	5.51 ms	5.61 ms: ⚪ 1.02x slower
parse_sqlglot_short	346 us	348 us: ⚪ 1.01x slower
parse_sqlglot_long	5.28 ms	5.31 ms: ⚪ 1.01x slower
parse_sqlglotrs_tpch	4.03 ms	4.08 ms: ⚪ 1.01x slower
parse_sqlglotrs_short	284 us	288 us: ⚪ 1.01x slower
parse_sqlglotrs_long	3.87 ms	3.99 ms: ⚪ 1.03x slower
parse_sqlglotrs_crazy	11.5 ms	11.9 ms: ⚪ 1.03x slower
Geometric mean	(ref)	⚪ 1.02x slower

Benchmark hidden because not significant (1): parse_sqlglot_crazy

Optimization Benchmark

Benchmark	bench_optimize_main	bench_optimize_pr
optimize_tpch	778 ms	788 ms: ⚪ 1.01x slower
optimize_condition_100	96.3 ms	97.0 ms: ⚪ 1.01x slower
Geometric mean	(ref)	⚪ 1.00x slower

Benchmark hidden because not significant (2): optimize_condition_10, optimize_condition_1000

geooo109 · 2025-10-30T12:25:18Z

/benchmark

github-actions · 2025-10-30T12:37:17Z

Benchmark Results

Legend:

🟢🟢 = 2x+ faster
🟢 = 1.1x - 2x faster
⚪ = No significant change (< 1.1x)
🔴 = 1.1x - 2x slower
🔴🔴 = 2x+ slower

Parsing Benchmark

Benchmark	bench_parse_main	bench_parse_pr
parse_sqlglot_long	5.32 ms	5.30 ms: ⚪ 1.00x faster
parse_sqlglotrs_long	3.88 ms	3.91 ms: ⚪ 1.01x slower
Geometric mean	(ref)	⚪ 1.00x slower

Benchmark hidden because not significant (6): parse_sqlglot_tpch, parse_sqlglot_short, parse_sqlglot_crazy, parse_sqlglotrs_tpch, parse_sqlglotrs_short, parse_sqlglotrs_crazy

Optimization Benchmark

Benchmark	bench_optimize_main	bench_optimize_pr
optimize_condition_100	96.7 ms	96.0 ms: ⚪ 1.01x faster
Geometric mean	(ref)	⚪ 1.00x faster

Benchmark hidden because not significant (3): optimize_tpch, optimize_condition_10, optimize_condition_1000

geooo109 · 2025-10-30T14:00:24Z

Current benchmarks don't show a significant performance slow down.

On the other hand, queries like the following (create a lot of AST transformation due to the NOT (x CONNECTOR y) => re-compute hash), create significant slow downs:

SELECT * FROM t WHERE
    NOT (
      NOT (
        NOT (
          NOT (
            NOT (
              (
                NOT ((a = 1 AND b = 2 AND TRUE) OR (c = 3 AND d = 4 OR FALSE)) AND
                NOT ((e = 5 OR f = 6 AND TRUE) AND (g = 7 OR h = 8 OR FALSE))
              ) OR (
                NOT (
                  NOT ((i = 9 AND j = 10) OR (k = 11 AND l = 12)) AND
                  NOT ((m = 13 OR n = 14) AND (o = 15 OR p = 16))
                )
              )
            ) AND (
              NOT (
                NOT (
                  NOT ((q = 17 AND r = 18) OR (s = 19 AND t = 20)) OR
                  NOT ((u = 21 OR v = 22) AND (w = 23 OR x = 24))
                ) AND (
                  (y = 25 OR NOT (z = 26 AND aa = 27))
                )
              )
            )
          ) OR (
            NOT (
              NOT (
                NOT (
                  (bb = 28 AND cc = 29) OR (dd = 30 AND ee = 31)
                ) AND (
                  (ff = 32 OR gg = 33) AND (hh = 34 OR ii = 35)
                )
              ) OR (
                NOT (
                  NOT ((jj = 36 AND kk = 37) OR (ll = 38 AND mm = 39))
                )
              )
            )
          )
        ) AND (
          NOT (
            NOT (
              NOT ((nn = 40 AND oo = 41) OR (pp = 42 AND qq = 43)) AND
              NOT ((rr = 44 OR ss = 45) AND (tt = 46 OR uu = 47))
            )
          ) OR (
            NOT (
              (vv = 48 AND NOT (ww = 49 OR xx = 50)) OR
              (NOT (yy = 51 AND zz = 52) AND aaa = 53)
            )
          )
        ) 
      )
    );

Old simplify (using the optimize function):
Total time for 50 runs: 8.7888s
Average time per run: 0.1758s

New simplify (using the optimize function):
Total time for 50 runs: 12.5861s
Average time per run: 0.2517s

georgesittas · 2025-10-30T14:05:57Z

I see, so stressing the "unhappy" code paths can result in a ~1.4x slowdown.. That's annoying, although somewhat expected. Let's pause for a bit and disuss to see how we can improve this- FYI @tobymao.

sqlglot/optimizer/annotate_types.py

georgesittas · 2025-11-03T11:30:31Z

sqlglot/optimizer/simplify.py

+            dialect = kwargs.get("dialect")
+            annotator = TypeAnnotator(
+                schema=ensure_schema(None, dialect=dialect), overwrite_types=False
+            )


This is not safe: having the right dialect is a must here, otherwise we may use incorrect inference rules.

Wondering whether we should just introduce a class that encapsulates the simplify rules so that we can share state like the type annotator and dialect among them, similar to what we do in the annotate_types module.

@georgesittas yep, I aggree, this isn't the final solution for this. We can do that, I think is compact.

'Aight, let's go with the class approach tomorrow then before benchmarking again.

georgesittas · 2025-11-03T11:35:07Z

sqlglot/optimizer/simplify.py

-                    )
+                    ),
+                    copy=False,


Seems like we're copying in main as well. Did you compare this branch to main w/ copy=False in both?

Yeap, I took this fix up into consideration for the benchmark, and this branch is 1.24x slower compared the fixed main.

@georgesittas I can push the change also in the main and run a new benchmark here.

Yes, let's make sure this fix is both in main and here (I'd double-check/prove that there are no mutation edge cases) and benchmark both branches w/out copying in this paren call.

sqlglot/optimizer/simplify.py

geooo109 changed the title ~~fix(optimizer)!: simplify AND with static BOOLEAN type~~ fix(optimizer)!: simplify AND/OR with static BOOLEAN type Oct 21, 2025

geooo109 force-pushed the geooo109/simplify_and branch 2 times, most recently from 306b884 to 66ffdd7 Compare October 22, 2025 12:55

georgesittas marked this pull request as draft October 23, 2025 09:49

georgesittas reviewed Oct 23, 2025

View reviewed changes

sqlglot/optimizer/simplify.py Outdated Show resolved Hide resolved

sqlglot/optimizer/simplify.py Outdated Show resolved Hide resolved

geooo109 force-pushed the geooo109/simplify_and branch from e5d9ed2 to 052bdcc Compare October 29, 2025 13:37

tobymao reviewed Oct 29, 2025

View reviewed changes

tests/fixtures/optimizer/simplify.sql Show resolved Hide resolved

geooo109 marked this pull request as ready for review October 29, 2025 15:06

geooo109 added 13 commits October 31, 2025 01:35

fix(optimizer)!: simplify AND with static BOOLEAN type

e7f432c

refactor impl

38614d0

fix tests

cc5db66

refactor impl

7b4a77a

refactor impl

77673f8

remove redundant annotations

d21b458

refactor impl 2

ff12913

refactor impl

027d1dc

remove comment from simplify

ddef182

revert line

04c8e7a

refactor impl

69b7353

add type checking for uknown

89c9a4b

remove extra annotation

5fb0d4b

geooo109 force-pushed the geooo109/simplify_and branch from bea403f to 5fb0d4b Compare October 31, 2025 11:45

avoid deepcopy for exp.paren

72d0d71

georgesittas reviewed Nov 3, 2025

View reviewed changes

PR feedback 1 (george partial)

c9d6ad9

fix(optimizer)!: simplify AND/OR with static BOOLEAN type #6152

Are you sure you want to change the base?

fix(optimizer)!: simplify AND/OR with static BOOLEAN type #6152

Conversation

geooo109 commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

georgesittas commented Oct 21, 2025

Uh oh!

georgesittas commented Oct 23, 2025

Uh oh!

Uh oh!

Uh oh!

georgesittas commented Oct 23, 2025

Uh oh!

Uh oh!

geooo109 commented Oct 30, 2025

Uh oh!

github-actions bot commented Oct 30, 2025

Benchmark Results

Parsing Benchmark

Optimization Benchmark

Uh oh!

geooo109 commented Oct 30, 2025

Uh oh!

github-actions bot commented Oct 30, 2025

Benchmark Results

Parsing Benchmark

Optimization Benchmark

Uh oh!

geooo109 commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

georgesittas commented Oct 30, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

geooo109 commented Oct 20, 2025 •

edited

Loading

geooo109 commented Oct 30, 2025 •

edited

Loading