skip slowness test #52431

LuciferYang · 2025-09-24T04:25:17Z

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

LuciferYang · 2025-09-24T12:52:57Z

I conducted a sampling analysis on the data from all Python tests and found that the following 16 test cases have execution times exceeding 60 seconds (for the time being, let's assume 60 seconds as the threshold; should we consider choosing a larger threshold?). Should we temporarily disable them and then re-enable them after optimizing their execution times? What are your opinions on this? @zhengruifeng @dongjoon-hyun @HyukjinKwon

class name	case name	time(s)
pyspark.sql.tests.connect.streaming.test_parity_listener.StreamingListenerParityTests	test_listener_events_spark_command	97.006
pyspark.sql.tests.connect.pandas.test_parity_pandas_transform_with_state.TransformWithStateInPandasParityTests	test_transform_with_state_with_timers_single_partition	87.612
pyspark.sql.tests.connect.pandas.test_parity_pandas_transform_with_state.TransformWithStateInPySparkParityTests	test_transform_with_state_with_timers_single_partition	89.920
pyspark.sql.tests.pandas.test_pandas_transform_with_state.TransformWithStateInPandasWithCheckpointV2Tests	test_transform_with_state_with_timers_single_partition	82.795
pyspark.sql.tests.pandas.test_pandas_transform_with_state.TransformWithStateInPySparkTests	test_transform_with_state_with_timers_single_partition	80.131
pyspark.sql.tests.pandas.test_pandas_transform_with_state.TransformWithStateInPySparkWithCheckpointV2Tests	test_transform_with_state_with_timers_single_partition	87.991
pyspark.mllib.tests.test_streaming_algorithms.StreamingLogisticRegressionWithSGDTests	test_training_and_prediction	75.927
pyspark.pandas.tests.connect.indexes.test_parity_datetime_property.DatetimeIndexParityTests	test_properties	71.911
pyspark.sql.tests.pandas.test_pandas_transform_with_state.TransformWithStateInPandasTests	test_transform_with_state_with_timers_single_partition	77.818
pyspark.pandas.tests.connect.groupby.test_parity_split_apply.GroupbyParitySplitApplyTests	test_split_apply_combine_on_series	66.697
pyspark.sql.tests.pandas.test_pandas_transform_with_state.TransformWithStateInPandasTests	test_schema_evolution_scenarios	60.880
pyspark.sql.tests.pandas.test_pandas_transform_with_state.TransformWithStateInPandasWithCheckpointV2Tests	test_schema_evolution_scenarios	60.634
pyspark.sql.tests.pandas.test_pandas_transform_with_state.TransformWithStateInPySparkTests	test_schema_evolution_scenarios	61.579
pyspark.sql.tests.pandas.test_pandas_transform_with_state.TransformWithStateInPySparkWithCheckpointV2Tests	test_schema_evolution_scenarios	67.869
pyspark.sql.tests.pandas.test_pandas_udf_scalar.ScalarPandasUDFTests	test_mixed_udf	66.439
pyspark.pandas.tests.connect.groupby.test_parity_split_apply_min_max.GroupbySplitApplyMMParityTests	test_split_apply_combine_on_series	61.645

dongjoon-hyun · 2025-09-24T16:45:01Z

Thank you for collecting the result and shedding the light to us, @LuciferYang . BTW, 97s is the maximum of test duration so far?

LuciferYang · 2025-09-25T02:44:18Z

Thank you for collecting the result and shedding the light to us, @LuciferYang . BTW, 97s is the maximum of test duration so far?

Yes, the statistical unit here is a test case, rather than a test file or a test class.

dongjoon-hyun · 2025-09-25T05:36:00Z

Thank you so much for spending your time on this. I really appreciate your passion, @LuciferYang .

I rechecked the usage Today. It seems that we spent 20 Full-Time runners for last week. It's lesser than I expected.

https://infra-reports.apache.org/#ghactions&project=spark&hours=168

The root cause seems that Apache Spark repository has less commits in these days. On August, we had lots of commits.

For now, let's keep the AS-IS status which means to increase the timeout-limit without skipping (until we need a serious action). We can skip the tests easily always if needed later, but recovering test coverage happens seldomly. WDYT?

LuciferYang · 2025-09-25T05:49:21Z

@dongjoon-hyun Ok ~ let me close this pr first. If there's a need later on, the statistical data from this pr can serve as a reference.

zhengruifeng · 2025-09-25T06:18:49Z

Thanks @LuciferYang and @dongjoon-hyun for the investigation, the data is very useful.
I think we can let alone tests for pandas API pyspark.sql.tests.pandas.*, because it is high likely we won't add new tests frequently, and they doesn't cause problems (unless serious performance regression in spark connect or sth)

init

e54977b

LuciferYang marked this pull request as draft September 24, 2025 04:25

github-actions bot added SQL PYTHON CONNECT labels Sep 24, 2025

LuciferYang added 2 commits September 24, 2025 14:01

more

c845029

format

0a9e4ae

LuciferYang closed this Sep 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

skip slowness test #52431

skip slowness test #52431

LuciferYang commented Sep 24, 2025

Uh oh!

LuciferYang commented Sep 24, 2025

Uh oh!

dongjoon-hyun commented Sep 24, 2025

Uh oh!

LuciferYang commented Sep 25, 2025

Uh oh!

dongjoon-hyun commented Sep 25, 2025

Uh oh!

LuciferYang commented Sep 25, 2025

Uh oh!

zhengruifeng commented Sep 25, 2025

Uh oh!

Uh oh!

skip slowness test #52431

skip slowness test #52431

Conversation

LuciferYang commented Sep 24, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

LuciferYang commented Sep 24, 2025

Uh oh!

dongjoon-hyun commented Sep 24, 2025

Uh oh!

LuciferYang commented Sep 25, 2025

Uh oh!

dongjoon-hyun commented Sep 25, 2025

Uh oh!

LuciferYang commented Sep 25, 2025

Uh oh!

zhengruifeng commented Sep 25, 2025

Uh oh!

Uh oh!