Skip to content

[SPARK-52171] [SS] StateDataSource join implementation for state v3 #51004

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

liviazhu-db
Copy link
Contributor

What changes were proposed in this pull request?

Add implementation for StateDataSource for state format v3 which uses virtual column families for the 4 join stores. This entails a few changes:

  • Inferring schema for for joins needs to take in oldSchemaFilePaths for state format v3.
  • sourceOptions need to be modified when the join store name is specified for state format v3, since the name is no longer the store name but the colFamily name. Subsequent metadata checks must also account for this.
  • A new joinColFamilyOpt needs to be passed through to the StateReaderInfo, StatePartitionReader, etc so that it can be used to read the correct column family.

Why are the changes needed?

Enable StateDataSource for join version 3.

Does this PR introduce any user-facing change?

Yes. Previously StateDataSource could not be used on checkpoints that use join state version 3, and now it can.

How was this patch tested?

New unit tests and enable disabled unit tests.

Was this patch authored or co-authored using generative AI tooling?

No

@liviazhu-db liviazhu-db marked this pull request as ready for review May 24, 2025 00:12
@liviazhu-db liviazhu-db changed the title [SPARK-52171] StateDataSource join implementation for state v3 [SPARK-52171] [SS] StateDataSource join implementation for state v3 May 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant