Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Generate a detailed report for the write ops #1536

Open
amahussein opened this issue Feb 7, 2025 · 0 comments
Open

[FEA] Generate a detailed report for the write ops #1536

amahussein opened this issue Feb 7, 2025 · 0 comments
Assignees
Labels
core_tools Scope the core module (scala) feature request New feature or request

Comments

@amahussein
Copy link
Collaborator

Is your feature request related to a problem? Please describe.

QualTool pulls writeFormats and then filters them out to report the types that are not supported.
This is reported in rapids_4_spark_qualification_output.csv column Unsupported Write Data Format

This only reports write formats that are not supported.

Describe the solution you'd like

We like to have a details report of the writing Ops similar to the data_source_information.csv

The report should include:

  • writing operations arguments
  • format schema
  • the target of the writing ops. (file, table..etc)

Changes required to support this feature

  • pull all the information related to writeExecs from the SqlPlanInfo
  • Parse the plan and identify the target, source, schema, format..etc
  • store the writeFormats in a table to report them at the end.

challenges

We may face some challenges:

  • dealing with different write ops like Hive/hadoop/delta..etc.
  • different write drivers customized by each customer. It may be challenging to know how to parse those plans unless we have a sample eventlog
  • some information might be missing in the eventlogs: truncated schema, AQE truncated final plan..etc
  • we need large set of eventlogs for testing.
@amahussein amahussein added ? - Needs Triage core_tools Scope the core module (scala) feature request New feature or request and removed ? - Needs Triage labels Feb 7, 2025
@amahussein amahussein self-assigned this Feb 7, 2025
@amahussein amahussein changed the title [FEA] Generate a details report for the write ops [FEA] Generate a detailed report for the write ops Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core_tools Scope the core module (scala) feature request New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant