Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix](cast datetime)add date like type coercion in legacy planner & result wrong with convert datetime to date predicate #39446

Open
wants to merge 2 commits into
base: branch-2.0
Choose a base branch
from

Conversation

mongo360
Copy link
Contributor

@mongo360 mongo360 commented Aug 15, 2024

Proposed changes

Problem 1
In legacy planner, the predicate like create_day <= '2024-08-08 23:59:59' will modify as cast (create_day as DATETIMEV2) <= '2024-08-08 23:59:59' when the column create_day is DATE; and this with cause the scan keys range not right in olap scan node.

Example

  1. Test Table
CREATE TABLE `ad_effects` (
  `pin_id` BIGINT NOT NULL COMMENT '广告主pin_id',
  `day` date NOT NULL COMMENT '点击日期(天)',
  `date_time` datetime NOT NULL COMMENT '点击时间',
  `impressions` BIGINT SUM NULL DEFAULT "0" COMMENT '用户总展现',
  `clicks` BIGINT SUM NULL DEFAULT "0" COMMENT '用户总点击',
  `cost` BIGINT SUM NULL DEFAULT "0" COMMENT '广告花费'
) ENGINE=OLAP
AGGREGATE KEY(`pin_id`, `day`, `date_time`)
COMMENT 'OLAP'
PARTITION BY RANGE(`day`)
(PARTITION p20240809 VALUES [('2024-08-09'), ('2024-08-10')),
PARTITION p20240810 VALUES [('2024-08-10'), ('2024-08-11')),
PARTITION p20240811 VALUES [('2024-08-11'), ('2024-08-12')),
PARTITION p20240812 VALUES [('2024-08-12'), ('2024-08-13')),
PARTITION p20240813 VALUES [('2024-08-13'), ('2024-08-14')),
PARTITION p20240814 VALUES [('2024-08-14'), ('2024-08-15')))
DISTRIBUTED BY HASH(`pin_id`) BUCKETS 16
PROPERTIES (
"replication_allocation" = "tag.location.default: 2",
"is_being_synced" = "false",
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "DAY",
"dynamic_partition.time_zone" = "Asia/Shanghai",
"dynamic_partition.start" = "-2147483648",
"dynamic_partition.end" = "3",
"dynamic_partition.prefix" = "p",
"dynamic_partition.replication_allocation" = "tag.location.default: 2",
"dynamic_partition.buckets" = "16",
"dynamic_partition.create_history_partition" = "false",
"dynamic_partition.history_partition_num" = "-1",
"dynamic_partition.hot_partition_num" = "0",
"dynamic_partition.reserved_history_periods" = "NULL",
"dynamic_partition.storage_policy" = "",
"storage_medium" = "hdd",
"storage_format" = "V2",
"light_schema_change" = "true",
"disable_auto_compaction" = "false",
"enable_single_replica_compaction" = "false"
);
  1. Test SQL
select  pin_id,sum(cost)  from  ad_effects  where  pin_id  =  200  and  day  >=  '2024-08-11  00:00:00'  and  day  <  '2024-08-12  08:00:00'  group  by  pin_id

the profile:

                        VNewOlapScanNode(ad_effects)  (id=0):(Active:  1.41ms,  %  non-child:  0.00%)
                              -  RuntimeFilters:  :  
                              -  PushDownPredicates:  []
                              -  KeyRanges:  ScanKeys:ScanKey=[200,0000-01-01  :  200,9999-12-31]
                              -  TabletIds:  [17701]
                              -  RemainedPredicates:  VectorizedFn[VectorizedFnCall[ge](arguments=(CAST  day(DateV2)  TO  DateTimeV2),  DateTimeV2,return=UInt8)]{
CastExpr(CAST  DateTimeV2  to  DateTimeV2){SlotRef(slot_id=2  type=DATEV2)},
VLiteral  (name  =  DateTimeV2,  type  =  DateTimeV2,  value  =  (2024-08-11  00:00:00))},  VectorizedFn[VectorizedFnCall[lt](arguments=(CAST  day(DateV2)  TO  DateTimeV2),  DateTimeV2,return=UInt8)]{
CastExpr(CAST  DateTimeV2  to  DateTimeV2){SlotRef(slot_id=2  type=DATEV2)},
VLiteral  (name  =  DateTimeV2,  type  =  DateTimeV2,  value  =  (2024-08-12  08:00:00))}

Solution
When the predicate is like date column with datetime literal, modify datetime literal to column type in legacy planner same as nereids planner;

  1. explain
mysql> explain select pin_id,sum(cost) from ad_effects where pin_id = 200 and day >= '2024-08-11 00:00:00' and day < '2024-08-12 08:00:00' group by pin_id;
+------------------------------------------------------------------------------------+
| Explain String(Old Planner)                                                        |
+------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0                                                                    |
|   OUTPUT EXPRS:                                                                    |
|     <slot 3> `pin_id`                                                              |
|     <slot 4> sum(`cost`)                                                           |
|   PARTITION: RANDOM                                                                |
|                                                                                    |
|   HAS_COLO_PLAN_NODE: false                                                        |
|                                                                                    |
|   VRESULT SINK                                                                     |
|                                                                                    |
|   1:VAGGREGATE (update finalize)                                                   |
|   |  output: sum(`cost`)                                                           |
|   |  group by: `pin_id`                                                            |
|   |  cardinality=-1                                                                |
|   |                                                                                |
|   0:VOlapScanNode                                                                  |
|      TABLE: default_cluster:db.ad_effects(ad_effects), PREAGGREGATION: ON          |
|      PREDICATES: `pin_id` = 200 AND `day` >= '2024-08-11' AND `day` < '2024-08-13' |
|      partitions=1/18 (p20240811)                                                   |
|      tablets=1/16, tabletList=17701                                                |
|      cardinality=2, avgRowSize=2527.5, numNodes=2                                  |
|      pushAggOp=NONE                                                                |
+------------------------------------------------------------------------------------+
  1. profile
          VNewOlapScanNode(ad_effects)  (id=0):(Active:  717.471us,  %  non-child:  0.00%)
                -  RuntimeFilters:  :  
                -  PushDownPredicates:  []
                -  KeyRanges:  ScanKeys:ScanKey=[200,2024-08-11  :  200,2024-08-13)
                -  TabletIds:  [17701]
                -  UseSpecificThreadToken:  False
                -  AcquireRuntimeFilterTime:  648ns
                -  AllocateResourceTime:  194.829us

Problem 2
In legacy planner, and enable_date_conversion = false mode; result wrong with convert datetime to date condition.

Example

  1. Test Table
CREATE TABLE `re_test_log_v2` (
  `pin_id` bigint NOT NULL,
  `create_time` datetime NOT NULL,
  `view` varchar(100) NOT NULL,
  `suc_num` int SUM NULL DEFAULT "0",
  `row_num` int SUM NULL DEFAULT "0"
) ENGINE=OLAP
AGGREGATE KEY(`pin_id`, `create_time`, `view`)
COMMENT 'OLAP'
PARTITION BY RANGE(`create_time`)
(PARTITION p202402 VALUES [('2024-02-01 00:00:00'), ('2024-03-01 00:00:00')))
DISTRIBUTED BY HASH(`pin_id`) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 2",
"is_being_synced" = "false",
"storage_medium" = "hdd",
"storage_format" = "V2",
"light_schema_change" = "true",
"disable_auto_compaction" = "false",
"enable_single_replica_compaction" = "false",
"enable_mow_light_delete" = "false"
);
insert into re_test_log_v2 values (100, '2024-02-01 00:00:00', 'first_view', 100, 50);
insert into re_test_log_v2 values (100, '2024-02-01 10:00:00', 'first_view', 100, 50);
admin set all frontends config ("enable_date_conversion" = "false");
set enable_nereids_planner = false;
mysql> select pin_id,sum(suc_num),sum(row_num) from re_test_log_v2 where pin_id = 100 and CONVERT(`create_time`, date) >= '2024-02-01' and CONVERT(`create_time`, date) <= '2024-02-01' groupy pin_id;
Empty set (0.21 sec)

Solution
add cast to datetime same with 1.2;

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@mongo360 mongo360 force-pushed the branch-2.0_fix-dateliketype-coercion branch from 80c6c44 to e8f21d5 Compare September 3, 2024 10:35
@mongo360 mongo360 changed the title [Fix](cast datetime)add date like type coercion in legacy planner [Fix](cast datetime)add date like type coercion in legacy planner & result wrong with convert datetime to date predicate Sep 3, 2024
@mongo360
Copy link
Contributor Author

mongo360 commented Sep 3, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 49392 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e8f21d557db90d8d71cfb47af527a5448ccfe925, data reload: false

------ Round 1 ----------------------------------
q1	18168	4395	4352	4352
q2	2071	153	146	146
q3	10464	1870	1949	1870
q4	10306	1235	1322	1235
q5	8508	3955	3949	3949
q6	233	123	124	123
q7	1999	1627	1614	1614
q8	9529	2711	2708	2708
q9	13773	10160	10014	10014
q10	8633	3491	3542	3491
q11	418	245	254	245
q12	465	302	296	296
q13	18383	3968	4029	3968
q14	352	329	332	329
q15	505	466	460	460
q16	547	471	458	458
q17	1144	982	969	969
q18	7235	6855	6884	6855
q19	1687	1594	1530	1530
q20	540	330	311	311
q21	4386	4144	4085	4085
q22	499	384	399	384
Total cold run time: 119845 ms
Total hot run time: 49392 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4330	4338	4328	4328
q2	323	224	222	222
q3	4168	4178	4164	4164
q4	2755	2732	2759	2732
q5	7192	7143	7167	7143
q6	234	121	126	121
q7	3247	2792	2829	2792
q8	4324	4480	4486	4480
q9	14188	14010	13861	13861
q10	4263	4254	4214	4214
q11	773	710	705	705
q12	1026	855	839	839
q13	7044	3793	3762	3762
q14	456	424	430	424
q15	499	467	455	455
q16	631	575	584	575
q17	3849	3834	3849	3834
q18	8804	8764	8719	8719
q19	1720	1673	1662	1662
q20	2426	2148	2108	2108
q21	8450	8516	8534	8516
q22	1035	951	937	937
Total cold run time: 81737 ms
Total hot run time: 76593 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 212836 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e8f21d557db90d8d71cfb47af527a5448ccfe925, data reload: false

query1	926	399	408	399
query2	6552	2030	1995	1995
query3	6920	203	206	203
query4	23229	21355	21721	21355
query5	19732	6571	6471	6471
query6	287	230	233	230
query7	4332	304	315	304
query8	252	245	232	232
query9	3104	2678	2620	2620
query10	468	297	300	297
query11	15959	15400	14915	14915
query12	126	85	77	77
query13	1050	462	441	441
query14	17367	13325	13564	13325
query15	368	217	229	217
query16	6479	289	267	267
query17	1756	914	897	897
query18	900	326	309	309
query19	211	152	147	147
query20	82	78	78	78
query21	191	95	96	95
query22	5017	5007	4965	4965
query23	34326	33412	33323	33323
query24	7879	6345	6336	6336
query25	529	437	437	437
query26	1267	160	166	160
query27	2400	300	298	298
query28	6091	2240	2238	2238
query29	2903	2624	2621	2621
query30	244	165	174	165
query31	925	740	737	737
query32	69	60	61	60
query33	452	261	261	261
query34	849	480	474	474
query35	1163	940	959	940
query36	1290	1097	1089	1089
query37	174	59	61	59
query38	3083	2969	2914	2914
query39	1371	1327	1321	1321
query40	306	98	98	98
query41	39	38	38	38
query42	93	90	81	81
query43	688	620	599	599
query44	1137	721	717	717
query45	243	235	235	235
query46	1226	960	963	960
query47	1926	1840	1708	1708
query48	504	429	417	417
query49	648	370	372	370
query50	870	631	577	577
query51	4741	4655	4729	4655
query52	103	91	81	81
query53	237	183	189	183
query54	2671	2518	2450	2450
query55	87	83	86	83
query56	235	197	210	197
query57	1205	1214	1054	1054
query58	204	205	216	205
query59	3437	3187	3214	3187
query60	206	217	216	216
query61	99	94	96	94
query62	856	488	589	488
query63	198	179	172	172
query64	3493	1604	1520	1520
query65	3612	3577	3565	3565
query66	783	439	449	439
query67	15800	15899	16400	15899
query68	9136	636	642	636
query69	496	260	275	260
query70	1597	1413	1353	1353
query71	396	301	310	301
query72	6959	4939	4762	4762
query73	755	324	314	314
query74	6358	5882	5861	5861
query75	5101	3675	3732	3675
query76	4994	1124	1136	1124
query77	827	252	250	250
query78	12543	11899	11933	11899
query79	6242	644	639	639
query80	1589	391	377	377
query81	496	238	241	238
query82	1610	99	101	99
query83	176	135	135	135
query84	249	72	69	69
query85	1041	328	325	325
query86	337	298	284	284
query87	3261	3004	3061	3004
query88	4911	2315	2298	2298
query89	419	281	287	281
query90	1877	216	218	216
query91	157	126	126	126
query92	60	50	52	50
query93	5289	583	590	583
query94	753	215	212	212
query95	2128	2082	2014	2014
query96	644	332	325	325
query97	6579	6456	6488	6456
query98	234	207	213	207
query99	2903	939	903	903
Total cold run time: 316087 ms
Total hot run time: 212836 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.46 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit e8f21d557db90d8d71cfb47af527a5448ccfe925, data reload: false

query1	0.03	0.03	0.02
query2	0.07	0.02	0.02
query3	0.24	0.04	0.05
query4	1.80	0.06	0.05
query5	0.54	0.53	0.53
query6	1.23	0.63	0.61
query7	0.01	0.01	0.01
query8	0.03	0.02	0.03
query9	0.52	0.48	0.49
query10	0.56	0.53	0.54
query11	0.14	0.09	0.09
query12	0.11	0.09	0.10
query13	0.62	0.62	0.61
query14	0.79	0.78	0.80
query15	0.79	0.75	0.77
query16	0.38	0.38	0.37
query17	1.02	1.02	1.05
query18	0.21	0.26	0.24
query19	1.93	1.85	1.86
query20	0.01	0.01	0.01
query21	15.45	0.60	0.57
query22	1.95	2.43	1.49
query23	16.26	0.96	0.98
query24	6.73	1.84	0.93
query25	0.33	0.11	0.05
query26	0.87	0.15	0.16
query27	0.04	0.03	0.04
query28	5.65	0.77	0.71
query29	12.77	2.32	2.07
query30	0.58	0.54	0.53
query31	2.79	0.39	0.37
query32	3.39	0.50	0.50
query33	3.04	3.06	3.08
query34	15.26	4.82	4.80
query35	4.84	4.86	4.87
query36	1.07	1.02	1.01
query37	0.06	0.05	0.05
query38	0.04	0.02	0.02
query39	0.02	0.01	0.01
query40	0.15	0.13	0.15
query41	0.07	0.01	0.02
query42	0.02	0.02	0.01
query43	0.03	0.02	0.02
Total cold run time: 102.44 s
Total hot run time: 30.46 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit e8f21d557db90d8d71cfb47af527a5448ccfe925 with default session variables
Stream load json:         20 seconds loaded 2358488459 Bytes, about 112 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       21.8 seconds inserted 10000000 Rows, about 458K ops/s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants