Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opt](metrics) Remove IntervalHistogramStat #47459

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

zhiqiang-hhhh
Copy link
Contributor

@zhiqiang-hhhh zhiqiang-hhhh commented Jan 26, 2025

What problem does this PR solve?

Use prometheus to calculate average value is better.

Related PR: #43144

For example, we use task_execution_time_ns_avg_in_last_1000_times which is equal to SUM(cost 0, ... cost 999) / 1000 to represent average execution time, it has two problems:

  1. Update of its data source _task_execution_time_ns_statistic acquires lock.
  2. Result of task_execution_time_ns_avg_in_last_1000_times is not zero if we just finished a set of tasks and no more tasks to run. For example, we have a continuous straight line after all tasks have finished for a while.
image

The problem can be fixed by:

  1. Using task_execution_time_ns_total an atomic counter to store total sum of execution time of each iteration.
  2. With the help of irate function of prometheus, we can have an equivalent substitution like irate(doris_be_task_execution_time_ns_total[$__rate_interval])/doris_be_thread_pool_active_threads
image

After all tasks finished, the curve will be zero, this is more reasonable.

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32219 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 43748492a043457ab6f95b7e249300ac812f21cd, data reload: false

------ Round 1 ----------------------------------
q1	17586	5472	5363	5363
q2	2054	303	165	165
q3	10424	1253	741	741
q4	10206	987	532	532
q5	7539	2411	2161	2161
q6	194	175	130	130
q7	914	767	593	593
q8	9225	1366	1230	1230
q9	5309	4872	4893	4872
q10	6823	2322	1893	1893
q11	468	276	260	260
q12	349	365	218	218
q13	17756	3738	3103	3103
q14	246	233	207	207
q15	524	469	469	469
q16	647	604	577	577
q17	588	882	331	331
q18	7082	6586	6358	6358
q19	2054	963	533	533
q20	313	330	194	194
q21	2890	2240	1991	1991
q22	373	342	298	298
Total cold run time: 103564 ms
Total hot run time: 32219 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5617	5531	5528	5528
q2	249	334	241	241
q3	2281	2631	2373	2373
q4	1484	1848	1401	1401
q5	4347	4768	4664	4664
q6	168	160	130	130
q7	2070	1941	1868	1868
q8	2620	2811	2733	2733
q9	7403	7235	7351	7235
q10	3252	3335	2827	2827
q11	575	515	499	499
q12	636	751	599	599
q13	3631	3984	3314	3314
q14	289	287	287	287
q15	521	461	481	461
q16	656	695	647	647
q17	1232	1730	1262	1262
q18	7767	7445	7296	7296
q19	803	1142	1103	1103
q20	2014	2015	1910	1910
q21	6048	5225	5048	5048
q22	611	625	590	590
Total cold run time: 54274 ms
Total hot run time: 52016 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191969 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 43748492a043457ab6f95b7e249300ac812f21cd, data reload: false

query1	1300	957	919	919
query2	6191	2039	1970	1970
query3	11132	4707	4731	4707
query4	60724	28974	23075	23075
query5	5507	601	469	469
query6	441	189	189	189
query7	5547	513	306	306
query8	336	259	235	235
query9	8471	2750	2720	2720
query10	451	318	255	255
query11	17660	15309	15552	15309
query12	159	109	106	106
query13	1419	598	430	430
query14	11463	6512	6597	6512
query15	211	209	182	182
query16	7522	673	515	515
query17	1133	737	610	610
query18	1947	416	320	320
query19	218	193	166	166
query20	118	118	113	113
query21	210	125	112	112
query22	4502	4873	4585	4585
query23	34309	33579	33422	33422
query24	5598	2302	2309	2302
query25	469	475	402	402
query26	651	296	154	154
query27	1889	481	338	338
query28	3833	2528	2478	2478
query29	546	594	449	449
query30	214	195	152	152
query31	927	876	798	798
query32	69	60	60	60
query33	436	368	301	301
query34	744	897	526	526
query35	831	854	782	782
query36	1004	1028	945	945
query37	124	107	77	77
query38	4332	4446	4326	4326
query39	1490	1447	1437	1437
query40	210	117	104	104
query41	54	51	47	47
query42	121	107	103	103
query43	521	551	491	491
query44	1397	863	831	831
query45	184	174	167	167
query46	889	1060	673	673
query47	1903	1971	1894	1894
query48	391	424	340	340
query49	729	489	390	390
query50	665	673	405	405
query51	4346	4314	4286	4286
query52	107	104	94	94
query53	244	271	189	189
query54	487	509	419	419
query55	79	83	85	83
query56	251	292	247	247
query57	1204	1207	1163	1163
query58	257	243	259	243
query59	3062	3197	2981	2981
query60	275	289	267	267
query61	119	125	119	119
query62	771	730	680	680
query63	226	196	198	196
query64	1747	1040	660	660
query65	3271	3158	3227	3158
query66	711	402	296	296
query67	15932	15601	15351	15351
query68	4204	832	541	541
query69	479	333	260	260
query70	1147	1102	1132	1102
query71	409	300	253	253
query72	5970	3859	3860	3859
query73	748	779	361	361
query74	9993	9029	8781	8781
query75	3237	3143	2670	2670
query76	3492	1174	770	770
query77	468	373	290	290
query78	10183	10047	9306	9306
query79	3140	809	633	633
query80	762	529	439	439
query81	527	279	245	245
query82	356	155	122	122
query83	174	183	178	178
query84	290	100	77	77
query85	759	366	306	306
query86	419	314	301	301
query87	4443	4456	4408	4408
query88	4714	2181	2157	2157
query89	393	324	300	300
query90	1557	194	196	194
query91	141	138	107	107
query92	67	60	54	54
query93	2843	862	541	541
query94	777	414	303	303
query95	334	272	265	265
query96	497	624	286	286
query97	2794	2846	2771	2771
query98	243	194	195	194
query99	1277	1371	1264	1264
Total cold run time: 311118 ms
Total hot run time: 191969 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.37 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 43748492a043457ab6f95b7e249300ac812f21cd, data reload: false

query1	0.03	0.04	0.03
query2	0.09	0.03	0.03
query3	0.24	0.07	0.06
query4	1.63	0.11	0.10
query5	0.42	0.42	0.39
query6	1.16	0.66	0.65
query7	0.02	0.02	0.02
query8	0.04	0.03	0.04
query9	0.59	0.48	0.52
query10	0.57	0.56	0.56
query11	0.15	0.10	0.11
query12	0.13	0.11	0.10
query13	0.60	0.60	0.60
query14	2.74	2.84	2.73
query15	0.88	0.83	0.82
query16	0.38	0.38	0.37
query17	1.00	0.98	1.06
query18	0.23	0.21	0.21
query19	1.92	1.89	2.02
query20	0.02	0.01	0.02
query21	15.36	0.96	0.64
query22	0.76	0.81	0.70
query23	15.26	1.45	0.53
query24	3.28	1.69	0.56
query25	0.17	0.20	0.13
query26	0.33	0.14	0.14
query27	0.05	0.06	0.05
query28	14.04	0.99	0.43
query29	12.57	3.98	3.28
query30	0.26	0.09	0.06
query31	2.82	0.57	0.38
query32	3.23	0.56	0.46
query33	2.98	3.02	3.01
query34	16.79	5.23	4.56
query35	4.57	4.55	4.50
query36	0.64	0.49	0.51
query37	0.10	0.06	0.06
query38	0.05	0.04	0.03
query39	0.03	0.02	0.02
query40	0.16	0.14	0.12
query41	0.08	0.03	0.02
query42	0.04	0.02	0.02
query43	0.04	0.03	0.02
Total cold run time: 106.45 s
Total hot run time: 30.37 s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 42.07% (10972/26083)
Line Coverage: 32.33% (92719/286798)
Region Coverage: 31.49% (47555/151029)
Branch Coverage: 27.53% (24085/87492)
Coverage Report: http://coverage.selectdb-in.cc/coverage/43748492a043457ab6f95b7e249300ac812f21cd_43748492a043457ab6f95b7e249300ac812f21cd/report/index.html

@zhiqiang-hhhh zhiqiang-hhhh changed the title [opt](metrics) Add reduce_size method to IntervalHistogramStat so that the curve could be more smooth [opt](metrics) Remove IntervalHistogramStat Jan 26, 2025
@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32291 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c9110ed667a20359b47ce94ed3ac8e14c740f519, data reload: false

------ Round 1 ----------------------------------
q1	17588	5719	5389	5389
q2	2054	301	165	165
q3	10432	1236	744	744
q4	10213	1016	551	551
q5	7534	2417	2160	2160
q6	195	168	133	133
q7	926	798	610	610
q8	9250	1403	1230	1230
q9	5221	4960	4940	4940
q10	6848	2336	1858	1858
q11	463	285	256	256
q12	348	356	218	218
q13	17754	3727	3070	3070
q14	232	230	218	218
q15	520	468	455	455
q16	639	629	580	580
q17	588	868	318	318
q18	7045	6605	6370	6370
q19	1765	947	538	538
q20	323	344	185	185
q21	3040	2152	1992	1992
q22	373	340	311	311
Total cold run time: 103351 ms
Total hot run time: 32291 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5589	5511	5477	5477
q2	245	325	233	233
q3	2234	2675	2336	2336
q4	1485	1873	1417	1417
q5	4376	4731	4612	4612
q6	165	161	130	130
q7	2059	1974	1982	1974
q8	2633	2847	2706	2706
q9	7383	7222	7309	7222
q10	3074	3310	2834	2834
q11	569	508	484	484
q12	644	746	578	578
q13	3490	3984	3619	3619
q14	272	289	269	269
q15	513	491	474	474
q16	673	719	675	675
q17	1263	1790	1236	1236
q18	7830	7575	7504	7504
q19	827	1184	1047	1047
q20	2063	2060	1908	1908
q21	5765	5036	5063	5036
q22	638	632	584	584
Total cold run time: 53790 ms
Total hot run time: 52355 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192285 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c9110ed667a20359b47ce94ed3ac8e14c740f519, data reload: false

query1	1332	954	928	928
query2	6058	2010	1963	1963
query3	10937	4514	4461	4461
query4	61341	28897	23366	23366
query5	5482	587	485	485
query6	431	187	183	183
query7	5480	521	316	316
query8	333	253	227	227
query9	7977	2719	2707	2707
query10	449	297	266	266
query11	16974	15200	15601	15200
query12	163	116	106	106
query13	1360	539	412	412
query14	10348	7713	6972	6972
query15	205	204	192	192
query16	7220	599	471	471
query17	1106	731	611	611
query18	1860	398	314	314
query19	201	180	170	170
query20	115	119	111	111
query21	206	121	106	106
query22	4616	4744	4556	4556
query23	34076	33508	33161	33161
query24	5659	2296	2307	2296
query25	477	482	402	402
query26	643	285	158	158
query27	1624	498	348	348
query28	4158	2540	2492	2492
query29	565	573	425	425
query30	215	198	164	164
query31	924	888	824	824
query32	73	59	66	59
query33	445	361	306	306
query34	754	860	522	522
query35	816	846	743	743
query36	1056	1033	991	991
query37	121	100	79	79
query38	4403	4307	4298	4298
query39	1478	1439	1441	1439
query40	212	120	99	99
query41	50	49	51	49
query42	117	104	109	104
query43	524	526	494	494
query44	1467	840	846	840
query45	183	185	174	174
query46	887	1057	650	650
query47	1887	1954	1845	1845
query48	403	416	335	335
query49	708	504	400	400
query50	673	678	405	405
query51	4248	4305	4244	4244
query52	108	104	95	95
query53	240	275	194	194
query54	522	559	433	433
query55	86	83	90	83
query56	275	271	276	271
query57	1255	1191	1153	1153
query58	249	245	247	245
query59	3205	3176	3109	3109
query60	293	302	267	267
query61	140	133	134	133
query62	735	733	643	643
query63	276	185	182	182
query64	1241	1016	666	666
query65	3264	3138	3123	3123
query66	678	397	321	321
query67	16012	15716	15436	15436
query68	5041	843	545	545
query69	470	306	257	257
query70	1160	1157	1108	1108
query71	413	281	254	254
query72	6211	3869	3909	3869
query73	836	752	360	360
query74	10101	9101	8988	8988
query75	3223	3167	2662	2662
query76	3798	1165	765	765
query77	517	379	277	277
query78	10065	10025	9356	9356
query79	3173	810	598	598
query80	1401	527	435	435
query81	538	279	236	236
query82	609	159	112	112
query83	258	180	152	152
query84	291	94	74	74
query85	779	408	298	298
query86	438	320	294	294
query87	4385	4430	4475	4430
query88	4641	2177	2139	2139
query89	399	323	293	293
query90	1589	186	183	183
query91	136	139	110	110
query92	62	57	51	51
query93	2900	870	540	540
query94	896	418	358	358
query95	324	264	258	258
query96	505	606	286	286
query97	2829	2889	2729	2729
query98	223	194	189	189
query99	1312	1362	1256	1256
Total cold run time: 311142 ms
Total hot run time: 192285 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.1 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit c9110ed667a20359b47ce94ed3ac8e14c740f519, data reload: false

query1	0.03	0.03	0.06
query2	0.08	0.03	0.03
query3	0.24	0.07	0.07
query4	1.62	0.10	0.10
query5	0.43	0.43	0.41
query6	1.14	0.65	0.66
query7	0.02	0.02	0.02
query8	0.04	0.04	0.03
query9	0.58	0.51	0.52
query10	0.57	0.56	0.55
query11	0.15	0.10	0.10
query12	0.15	0.11	0.11
query13	0.61	0.60	0.60
query14	2.81	2.77	2.83
query15	0.91	0.84	0.84
query16	0.39	0.39	0.38
query17	1.09	1.01	0.99
query18	0.24	0.21	0.21
query19	1.89	1.81	2.03
query20	0.02	0.01	0.01
query21	15.36	0.92	0.59
query22	0.75	0.74	0.78
query23	15.25	1.47	0.66
query24	2.78	1.36	1.24
query25	0.16	0.26	0.08
query26	0.18	0.16	0.14
query27	0.06	0.06	0.05
query28	14.20	1.02	0.44
query29	12.59	3.99	3.26
query30	0.25	0.09	0.06
query31	2.81	0.61	0.37
query32	3.25	0.55	0.46
query33	3.01	3.01	3.02
query34	16.60	5.17	4.54
query35	4.53	4.47	4.46
query36	0.65	0.50	0.49
query37	0.09	0.07	0.06
query38	0.04	0.03	0.04
query39	0.04	0.02	0.02
query40	0.17	0.14	0.12
query41	0.08	0.02	0.02
query42	0.04	0.02	0.02
query43	0.03	0.02	0.03
Total cold run time: 105.93 s
Total hot run time: 31.1 s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 42.05% (10965/26076)
Line Coverage: 32.33% (92709/286746)
Region Coverage: 31.48% (47534/151006)
Branch Coverage: 27.53% (24079/87480)
Coverage Report: http://coverage.selectdb-in.cc/coverage/c9110ed667a20359b47ce94ed3ac8e14c740f519_c9110ed667a20359b47ce94ed3ac8e14c740f519/report/index.html

@zhiqiang-hhhh zhiqiang-hhhh marked this pull request as ready for review January 26, 2025 13:30
@zhiqiang-hhhh
Copy link
Contributor Author

run external

Copy link
Contributor

@HappenLee HappenLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Jan 29, 2025
Copy link
Contributor

PR approved by anyone and no changes requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants