Skip to content

Conversation

@Yicong-Huang
Copy link
Contributor

@Yicong-Huang Yicong-Huang commented Nov 13, 2025

What changes were proposed in this pull request?

This PR consolidates GroupPandasIterUDFSerializer with GroupPandasUDFSerializer to eliminate code duplication and improve maintainability.

Modified GroupPandasUDFSerializer (python/pyspark/sql/pandas/serializers.py):

  • Added use_iterator parameter to support both regular and iterator modes
  • Extracted common batch-to-pandas conversion logic into _convert_batches_to_pandas helper method
  • Unified load_stream() and dump_stream() methods to handle both modes with minimal branching
  • Both modes now use a single code path with conditional batch grouping

Why are the changes needed?

When Iterator[pandas.DataFrame] API was added to groupBy().applyInPandas() in SPARK-53614 (#52716), a new GroupPandasIterUDFSerializer class was created. However, this class is nearly identical to GroupPandasUDFSerializer, differing only in whether batches are processed lazily (iterator mode) or all at once (regular mode).

Does this PR introduce any user-facing change?

No.

How was this patch tested?

All existing tests pass without modification:

  • Iterator mode tests (11 tests): test_apply_in_pandas_iterator_*
  • Regular mode tests (39 tests): all other ApplyInPandasTests

Was this patch authored or co-authored using generative AI tooling?

Co-Generated-by: Cursor with Claude 4.5 Sonnet

safecheck,
assign_cols_by_name,
int_to_decimal_coercion_enabled,
use_iterator=False,
Copy link
Contributor

@zhengruifeng zhengruifeng Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please keep in line with GroupArrowUDFSerializer, we should not add such use_iterator argument, this serializer always return iterator,
and then adjust the function wrappers, you can refer to wrap_grouped_map_arrow_udf and wrap_grouped_map_arrow_iter_udf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rewrote to align with GroupArrowUDFSerializer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants