Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The estimated rows of LEFT JOIN and INNER JOIN are not as expected #59669

Open
DBMSTesting opened this issue Feb 20, 2025 · 0 comments
Open

The estimated rows of LEFT JOIN and INNER JOIN are not as expected #59669

DBMSTesting opened this issue Feb 20, 2025 · 0 comments
Labels
type/bug The issue is confirmed as a bug.

Comments

@DBMSTesting
Copy link

Bug Report

The DBMS produces unexpectedly inefficient query plans, which may result in slow query execution. Similar to bug#59662, I believe that addressing this issue may result in significant performance gains if the estimated rows executed are abnormal even if the returned results are correct.

We found a problem when testing the join function of tidb. The only difference between the following two statements is that statement 1 uses LEFT JOIN, while statement 2 uses INNER JOIN. For the join function, the estimated rows of INNER JOIN should be less than that of LEFT JOIN, because according to the SQL standard, INNER JOIN will extract data columns with matching values ​​in the two data tables; LEFT JOIN/RIGHT JOIN will extract all data columns in the left/right data table, as well as matching data columns in the other data table; but when we use the explain function to view the estimated rows of the two statements, we find that the estimated rows of INNER JOIN are greater than the estimated rows of LEFT JOIN.

--EXPLAIN SELECT t42.c0, t0.c0 FROM t42 INNER JOIN t0 ON (CASE (((CASE -877155275 WHEN '0' THEN t0.c0 ELSE NULL END ))|(false)) WHEN (~ (t0.c0)) THEN false WHEN '2' THEN (('0')&(t42.c0)) ELSE FIELD(((t0.c0)<(((NULL)!=(0.5851810278984321)))), t0.c0) END ) WHERE 1413898330 GROUP BY t42.c0, t0.c0 HAVING ((t0.c0)LIKE(((t42.c0)NOT REGEXP(CAST(287635857 AS SIGNED))))); --10
--EXPLAIN SELECT t42.c0, t0.c0 FROM t42 LEFT JOIN t0 ON (CASE (((CASE -877155275 WHEN '0' THEN t0.c0 ELSE NULL END ))|(false)) WHEN (~ (t0.c0)) THEN false WHEN '2' THEN (('0')&(t42.c0)) ELSE FIELD(((t0.c0)<(((NULL)!=(0.5851810278984321)))), t0.c0) END ) WHERE 1413898330 GROUP BY t42.c0, t0.c0 HAVING ((t0.c0)LIKE(((t42.c0)NOT REGEXP(CAST(287635857 AS SIGNED))))); --8

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

USE test;
DROP DATABASE IF EXISTS database3;
CREATE DATABASE database3;
USE database3;
CREATE TABLE t42(c0 CHAR );
CREATE TABLE t0 LIKE t42;
UPDATE t0 SET c0='';
INSERT INTO t42(c0) VALUES (NULL), ('{');
REPLACE INTO t42 VALUES ('o');
UPDATE t0 SET c0='1' WHERE (('R')=(((t0.c0)>(t0.c0))));
REPLACE INTO t42 VALUES ('0'), ('f');
set @@tidb_enable_chunk_rpc=1;
REPLACE INTO t42 VALUES ('A');
INSERT IGNORE INTO t0 VALUES (NULL);
ALTER TABLE t0 DISABLE KEYS;
ANALYZE TABLE t0 WITH 690 BUCKETS;
ALTER TABLE t0 MODIFY c0 NUMERIC;
ALTER TABLE t0 ORDER BY c0 DESC;
set @@tidb_index_join_batch_size=3260;
REPLACE INTO t42(c0) VALUES ('N'), ('&'), ('<');
ALTER TABLE t42 ORDER BY c0 DESC;
INSERT IGNORE INTO t0 VALUES (1447599182), (-550806702) ON DUPLICATE KEY UPDATE c0=t0.c0;
REPLACE INTO t42(c0) VALUES ('*');
INSERT INTO t0 VALUES (1016270202) ON DUPLICATE KEY UPDATE c0=t0.c0;
set @@tidb_hashagg_partial_concurrency=25;
REPLACE INTO t42 VALUES ('Y');
SET @@sql_mode='STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION';
ANALYZE TABLE t0;
ANALYZE TABLE t42;

--EXPLAIN SELECT t42.c0, t0.c0 FROM t42 INNER JOIN t0 ON (CASE (((CASE -877155275 WHEN '0' THEN t0.c0 ELSE NULL END ))|(false)) WHEN (~ (t0.c0)) THEN false WHEN '2' THEN (('0')&(t42.c0)) ELSE FIELD(((t0.c0)<(((NULL)!=(0.5851810278984321)))), t0.c0) END ) WHERE 1413898330 GROUP BY t42.c0, t0.c0 HAVING ((t0.c0)LIKE(((t42.c0)NOT REGEXP(CAST(287635857 AS SIGNED))))); --10
--EXPLAIN SELECT t42.c0, t0.c0 FROM t42 LEFT JOIN t0 ON (CASE (((CASE -877155275 WHEN '0' THEN t0.c0 ELSE NULL END ))|(false)) WHEN (~ (t0.c0)) THEN false WHEN '2' THEN (('0')&(t42.c0)) ELSE FIELD(((t0.c0)<(((NULL)!=(0.5851810278984321)))), t0.c0) END ) WHERE 1413898330 GROUP BY t42.c0, t0.c0 HAVING ((t0.c0)LIKE(((t42.c0)NOT REGEXP(CAST(287635857 AS SIGNED))))); --8

2. What did you expect to see? (Required)

The estimated rows of statement 2 should be less than or equal to the estimated rows of statement 1

3. What did you see instead (Required)

The estimated rows of statement 2 greater than the estimated rows of statement 1

4. What is your TiDB version? (Required)

Release Version: v7.5.1

@DBMSTesting DBMSTesting added the type/bug The issue is confirmed as a bug. label Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug The issue is confirmed as a bug.
Projects
None yet
Development

No branches or pull requests

1 participant