Skip to content

[SPARK-52978][SQL] Make FileFormatWriter customizable via SQL configuration #51690

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

zhztheplayer
Copy link
Member

@zhztheplayer zhztheplayer commented Jul 28, 2025

What changes were proposed in this pull request?

A change to make the V1 file write API FileFormatWriter.write customizable via a SQL configuration option.

Change summary:

  1. Add trait FileFormatWriter;
  2. Rename the current FileFormatWriter object to DefaultFileFormatWriter, then make it inherit FileFormatWriter;
  3. Provide a SQL option spark.sql.execution.fileFormatWriterClass=... (optional) to allow user set a custom implementation class of FileFormatWriter.
    If spark.sql.execution.fileFormatWriterClass is not present in SQL config, DefaultFileFormatWriter will be used.

Why are the changes needed?

Doing this will:

  1. Allow 3rd columnar plugins (by setting this option by default) to specify a columnar v1 writer during plugin initialization;
  2. Allow user to specify a customized row-based writer for certain purposes, e.g., performance, specific handling of partitioning, etc.;

Does this PR introduce any user-facing change?

A developer-oriented change:

The calls to FileFormatWriter needs to be changed to DefaultFileFormatWriter as the former becomes a trait.

For example, before:

FileFormatWriter.write(...)

After:

DefaultFileFormatWriter.write(...)

How was this patch tested?

Existing tests. And a new test case in FileFormatWriterSuite.

Was this patch authored or co-authored using generative AI tooling?

No.

@zhztheplayer zhztheplayer changed the title Wip 52978 [SPARK-52978] Make FileFormatWriter customizable via SQL configuration Jul 28, 2025
@zhztheplayer zhztheplayer changed the title [SPARK-52978] Make FileFormatWriter customizable via SQL configuration [SPARK-52978][SQL] Make FileFormatWriter customizable via SQL configuration Jul 28, 2025
@zhztheplayer zhztheplayer marked this pull request as ready for review July 28, 2025 22:13
@zhztheplayer zhztheplayer marked this pull request as draft July 29, 2025 06:28
@zhztheplayer

This comment was marked as outdated.

@zhztheplayer zhztheplayer marked this pull request as ready for review July 29, 2025 09:29
@zhztheplayer
Copy link
Member Author

cc @cloud-fan @yaooqinn @dongjoon-hyun Appreciate your thoughts on this.

We had some long-standing issues in Gluten that could be resolved by opening up this API. It will also be helpful for lake support in Gluten as well.



/** A helper object for writing FileFormat data out to a location. */
object DefaultFileFormatWriter extends FileFormatWriter with Logging {
Copy link
Member Author

@zhztheplayer zhztheplayer Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ease reviewing:

Git determines this file as a creation. However this is mostly a trivial file move, i.e., mv FileFormatWriter.scala DefaultFileFormatWriter.scala. The content is basically identical.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant