Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opt](log) Print last failure status for unhealthy replica #38153

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

bobhan1
Copy link
Contributor

@bobhan1 bobhan1 commented Jul 19, 2024

Proposed changes

This PR records last failed status for each replica and print it when commit or publish fails for better debugging.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@bobhan1 bobhan1 force-pushed the report-failed-status-publish branch 2 times, most recently from a85249c to 7a1e4cd Compare July 19, 2024 10:06
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@bobhan1 bobhan1 force-pushed the report-failed-status-publish branch from 7a1e4cd to e105e4f Compare July 19, 2024 10:09
@bobhan1
Copy link
Contributor Author

bobhan1 commented Jul 19, 2024

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

2 similar comments
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@bobhan1 bobhan1 force-pushed the report-failed-status-publish branch from e105e4f to d0bc114 Compare July 19, 2024 11:07
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@bobhan1 bobhan1 force-pushed the report-failed-status-publish branch from d0bc114 to beaefbd Compare July 19, 2024 11:14
@bobhan1
Copy link
Contributor Author

bobhan1 commented Jul 19, 2024

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 40470 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit beaefbd205e2894e5e51f085cf7aa21a319f333e, data reload: false

------ Round 1 ----------------------------------
q1	19121	5007	4445	4445
q2	2629	194	190	190
q3	11752	1222	1063	1063
q4	10303	801	894	801
q5	7584	2777	2769	2769
q6	231	144	146	144
q7	982	622	614	614
q8	9380	2075	2114	2075
q9	8729	6614	6620	6614
q10	8718	3794	3772	3772
q11	451	239	244	239
q12	393	232	229	229
q13	17756	2959	3010	2959
q14	273	236	228	228
q15	529	484	485	484
q16	469	383	384	383
q17	988	700	717	700
q18	8000	7492	7479	7479
q19	6103	1527	1343	1343
q20	702	317	346	317
q21	4958	3339	3353	3339
q22	361	297	283	283
Total cold run time: 120412 ms
Total hot run time: 40470 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4390	4303	4274	4274
q2	377	288	255	255
q3	2967	2760	2778	2760
q4	1867	1615	1625	1615
q5	5326	5320	5310	5310
q6	232	140	138	138
q7	2120	1696	1677	1677
q8	3216	3359	3351	3351
q9	8525	8444	8447	8444
q10	3871	3671	3687	3671
q11	593	465	492	465
q12	790	639	656	639
q13	16467	2978	2977	2977
q14	303	281	270	270
q15	514	485	473	473
q16	460	418	429	418
q17	1772	1486	1478	1478
q18	7764	7475	7417	7417
q19	1650	1489	1565	1489
q20	1962	1790	1804	1790
q21	4863	4697	4738	4697
q22	562	509	505	505
Total cold run time: 70591 ms
Total hot run time: 54113 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 172794 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit beaefbd205e2894e5e51f085cf7aa21a319f333e, data reload: false

query1	908	370	362	362
query2	6484	1885	1819	1819
query3	6655	206	222	206
query4	23297	17555	17352	17352
query5	4220	489	503	489
query6	275	167	166	166
query7	4601	299	284	284
query8	244	196	198	196
query9	8606	2456	2442	2442
query10	458	309	283	283
query11	11270	10037	10058	10037
query12	130	90	85	85
query13	1659	373	376	373
query14	10556	8229	7026	7026
query15	250	163	172	163
query16	7774	487	467	467
query17	1604	566	562	562
query18	1850	288	298	288
query19	214	162	159	159
query20	95	90	85	85
query21	206	136	140	136
query22	4246	4140	3977	3977
query23	33801	33046	33283	33046
query24	11223	2890	2893	2890
query25	653	409	410	409
query26	1414	156	156	156
query27	2923	276	276	276
query28	7609	2006	1998	1998
query29	923	652	641	641
query30	282	155	151	151
query31	978	755	750	750
query32	97	56	57	56
query33	782	374	370	370
query34	901	493	510	493
query35	881	758	728	728
query36	1102	923	938	923
query37	156	85	80	80
query38	2866	2780	2756	2756
query39	859	809	814	809
query40	299	125	130	125
query41	52	48	51	48
query42	120	103	105	103
query43	495	469	456	456
query44	1245	736	738	736
query45	202	167	166	166
query46	1079	740	732	732
query47	1837	1755	1762	1755
query48	369	293	299	293
query49	1102	440	448	440
query50	787	412	401	401
query51	6985	6881	6824	6824
query52	111	91	99	91
query53	374	296	306	296
query54	902	448	448	448
query55	77	73	74	73
query56	314	266	264	264
query57	1163	1055	1065	1055
query58	261	246	250	246
query59	2697	2600	2656	2600
query60	319	281	280	280
query61	97	101	94	94
query62	859	649	629	629
query63	331	295	293	293
query64	10274	2229	1664	1664
query65	3186	3134	3107	3107
query66	1365	339	335	335
query67	15523	14811	14975	14811
query68	8592	555	559	555
query69	752	461	397	397
query70	1336	1116	1021	1021
query71	511	293	285	285
query72	9236	5566	5559	5559
query73	2158	333	322	322
query74	6067	5674	5718	5674
query75	4845	2724	2663	2663
query76	5041	998	967	967
query77	800	331	335	331
query78	9654	9176	8955	8955
query79	7807	527	532	527
query80	1904	495	483	483
query81	587	217	234	217
query82	303	141	137	137
query83	289	175	166	166
query84	270	89	85	85
query85	985	349	296	296
query86	345	306	316	306
query87	3278	3107	3086	3086
query88	5296	2420	2377	2377
query89	529	384	373	373
query90	2012	199	195	195
query91	130	105	99	99
query92	66	52	50	50
query93	5993	520	509	509
query94	1390	275	287	275
query95	412	324	316	316
query96	604	269	268	268
query97	3191	3039	3015	3015
query98	218	210	195	195
query99	1454	1236	1308	1236
Total cold run time: 301532 ms
Total hot run time: 172794 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.59 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit beaefbd205e2894e5e51f085cf7aa21a319f333e, data reload: false

query1	0.04	0.04	0.04
query2	0.08	0.05	0.04
query3	0.23	0.06	0.05
query4	1.68	0.07	0.08
query5	0.48	0.48	0.50
query6	1.14	0.72	0.73
query7	0.02	0.01	0.01
query8	0.05	0.04	0.04
query9	0.55	0.50	0.48
query10	0.54	0.54	0.55
query11	0.15	0.11	0.11
query12	0.15	0.12	0.12
query13	0.60	0.59	0.59
query14	0.76	0.76	0.78
query15	0.84	0.82	0.81
query16	0.36	0.35	0.38
query17	0.95	0.94	0.96
query18	0.22	0.22	0.22
query19	1.82	1.72	1.68
query20	0.01	0.00	0.00
query21	15.42	0.77	0.66
query22	4.21	6.15	2.98
query23	18.34	1.37	1.25
query24	2.10	0.23	0.24
query25	0.16	0.09	0.08
query26	0.28	0.21	0.21
query27	0.46	0.22	0.22
query28	13.18	1.02	1.01
query29	12.67	3.34	3.34
query30	0.25	0.06	0.06
query31	2.85	0.39	0.41
query32	3.28	0.46	0.47
query33	2.90	2.88	2.90
query34	17.09	4.32	4.34
query35	4.38	4.40	4.43
query36	0.65	0.48	0.49
query37	0.18	0.16	0.15
query38	0.15	0.14	0.15
query39	0.04	0.04	0.03
query40	0.15	0.12	0.12
query41	0.10	0.05	0.04
query42	0.06	0.05	0.06
query43	0.05	0.04	0.04
Total cold run time: 109.62 s
Total hot run time: 31.59 s

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@bobhan1
Copy link
Contributor Author

bobhan1 commented Jul 22, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40122 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 22c3ee0a40bddca12756d398e2f46b9a9726f379, data reload: false

------ Round 1 ----------------------------------
q1	18256	4999	4278	4278
q2	2044	196	190	190
q3	10501	1177	1092	1092
q4	10196	782	941	782
q5	7539	2693	2669	2669
q6	223	139	136	136
q7	958	605	602	602
q8	9218	2066	2082	2066
q9	8790	6599	6553	6553
q10	8810	3831	3829	3829
q11	484	243	245	243
q12	396	223	222	222
q13	17778	2986	3001	2986
q14	280	227	240	227
q15	524	502	499	499
q16	501	381	383	381
q17	971	711	720	711
q18	8178	7579	7518	7518
q19	5356	1387	1406	1387
q20	664	315	324	315
q21	4919	3159	3205	3159
q22	348	284	277	277
Total cold run time: 116934 ms
Total hot run time: 40122 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4346	4258	4235	4235
q2	380	272	255	255
q3	3049	2909	2910	2909
q4	1969	1679	1765	1679
q5	5649	5527	5550	5527
q6	220	141	134	134
q7	2269	2003	1843	1843
q8	3300	3459	3444	3444
q9	8830	8798	8798	8798
q10	4052	3851	3679	3679
q11	600	504	522	504
q12	870	636	630	630
q13	16321	3172	3168	3168
q14	338	300	279	279
q15	528	500	495	495
q16	475	425	438	425
q17	1841	1499	1544	1499
q18	8220	7977	7794	7794
q19	1735	1573	1581	1573
q20	2136	1873	1860	1860
q21	7736	5069	4983	4983
q22	552	504	514	504
Total cold run time: 75416 ms
Total hot run time: 56217 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 174593 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 22c3ee0a40bddca12756d398e2f46b9a9726f379, data reload: false

query1	931	379	370	370
query2	6426	1926	1799	1799
query3	6642	213	217	213
query4	23241	17580	17485	17485
query5	3727	470	455	455
query6	275	188	184	184
query7	4584	297	284	284
query8	261	209	193	193
query9	8722	2458	2434	2434
query10	431	297	269	269
query11	12283	10202	10039	10039
query12	114	83	86	83
query13	1655	359	386	359
query14	10231	7164	8206	7164
query15	213	179	189	179
query16	7217	462	426	426
query17	1121	554	525	525
query18	1573	274	269	269
query19	197	153	145	145
query20	90	86	80	80
query21	204	130	130	130
query22	4134	3976	4098	3976
query23	34346	33792	33773	33773
query24	11435	2994	2963	2963
query25	594	394	399	394
query26	707	157	148	148
query27	2340	274	281	274
query28	5859	2110	2064	2064
query29	888	637	613	613
query30	252	158	156	156
query31	965	749	788	749
query32	105	53	57	53
query33	740	342	332	332
query34	913	510	504	504
query35	908	772	761	761
query36	1169	991	1010	991
query37	139	88	98	88
query38	2954	2862	2839	2839
query39	893	813	817	813
query40	202	124	124	124
query41	47	47	45	45
query42	116	102	99	99
query43	534	468	480	468
query44	1231	739	738	738
query45	192	163	165	163
query46	1096	751	721	721
query47	1885	1778	1764	1764
query48	379	300	302	300
query49	850	422	419	419
query50	793	384	381	381
query51	6728	6681	6669	6669
query52	107	92	90	90
query53	363	293	282	282
query54	895	458	457	457
query55	76	76	80	76
query56	313	277	288	277
query57	1164	1044	1042	1042
query58	259	276	287	276
query59	2906	2577	2723	2577
query60	318	298	300	298
query61	117	114	118	114
query62	775	652	638	638
query63	326	285	292	285
query64	9200	2310	1755	1755
query65	3171	3105	3129	3105
query66	763	329	337	329
query67	15410	15079	14848	14848
query68	6154	548	549	548
query69	714	446	377	377
query70	1174	1175	1163	1163
query71	468	288	282	282
query72	8513	6071	5807	5807
query73	782	332	327	327
query74	6076	5703	5658	5658
query75	4526	2707	2670	2670
query76	3715	939	933	933
query77	762	309	308	308
query78	9777	9852	9293	9293
query79	11125	554	536	536
query80	1580	492	487	487
query81	585	221	221	221
query82	753	134	136	134
query83	306	167	169	167
query84	274	88	85	85
query85	1386	335	309	309
query86	385	304	301	301
query87	3325	3078	3101	3078
query88	5635	2402	2410	2402
query89	561	384	392	384
query90	1754	196	194	194
query91	130	99	105	99
query92	62	49	50	49
query93	7319	529	525	525
query94	793	288	293	288
query95	402	318	333	318
query96	642	274	275	274
query97	3195	3008	3011	3008
query98	257	209	194	194
query99	1530	1258	1263	1258
Total cold run time: 294355 ms
Total hot run time: 174593 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.95 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 22c3ee0a40bddca12756d398e2f46b9a9726f379, data reload: false

query1	0.04	0.04	0.04
query2	0.07	0.04	0.04
query3	0.22	0.04	0.05
query4	1.68	0.07	0.07
query5	0.50	0.48	0.49
query6	1.14	0.72	0.73
query7	0.02	0.01	0.02
query8	0.05	0.04	0.04
query9	0.56	0.48	0.49
query10	0.54	0.54	0.56
query11	0.16	0.12	0.12
query12	0.14	0.12	0.13
query13	0.59	0.60	0.59
query14	0.76	0.78	0.79
query15	0.84	0.82	0.82
query16	0.36	0.36	0.37
query17	0.97	0.97	1.01
query18	0.22	0.22	0.22
query19	1.75	1.74	1.75
query20	0.01	0.02	0.01
query21	15.39	0.75	0.64
query22	4.53	6.45	2.27
query23	18.26	1.32	1.21
query24	2.12	0.23	0.23
query25	0.15	0.08	0.08
query26	0.29	0.21	0.22
query27	0.45	0.23	0.24
query28	13.27	1.01	1.01
query29	12.62	3.29	3.33
query30	0.25	0.06	0.06
query31	2.91	0.40	0.39
query32	3.23	0.47	0.46
query33	2.86	2.94	2.89
query34	17.06	4.38	4.35
query35	4.46	4.46	4.40
query36	0.66	0.47	0.47
query37	0.19	0.15	0.15
query38	0.15	0.15	0.14
query39	0.04	0.03	0.04
query40	0.14	0.11	0.12
query41	0.10	0.04	0.05
query42	0.05	0.06	0.06
query43	0.05	0.04	0.04
Total cold run time: 109.85 s
Total hot run time: 30.95 s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants