Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] AutoTuner/Bootstrapper: Support user-provided spark properties #1526

Open
parthosa opened this issue Feb 4, 2025 · 3 comments
Open
Assignees
Labels
autotuner feature request New feature or request

Comments

@parthosa
Copy link
Collaborator

parthosa commented Feb 4, 2025

Users should have a way to specify additional Spark properties alongside those recommended by the AutoTuner/Bootstrapper.

The goal is to make the final tuning file ready for direct use by consolidating properties from the AutoTuner/Bootstrapper with the user-provided spark properties.

For example, the AutoTuner/Bootstrapper does not recommend certain Spark properties due to potential risks (data integrity etc). However, users should have the option to enable them if needed:

spark.rapids.sql.castStringToTimestamp.enabled=true
spark.rapids.sql.hasExtendedYearValues=false
spark.rapids.sql.incompatibleDateFormats.enabled=true

Possible Ideas:

  • These properties can be part of --tools_config_file

cc: @viadea @kuhushukla

@parthosa parthosa added ? - Needs Triage bug Something isn't working labels Feb 4, 2025
@kuhushukla
Copy link
Collaborator

The autotuner should not recommend these values. There should be a way where the user says it is ok to run in the "incompat, I dont care, rampant" mode which can be supplied as a config file beyond what the autotuner recommends. If we give these configs as expected , we break data integrity in some cases quite silently. The user should opt in to this.

@parthosa
Copy link
Collaborator Author

parthosa commented Feb 5, 2025

Based on offline discussions, we should allow users to specify custom spark_properties in addition to those recommended by the AutoTuner.

The goal is to make the final tuning file ready for direct use, containing a combined set of properties from the AutoTuner/Bootstrapper along with user-defined configurations.

Updated the description to reflect this.

@parthosa parthosa changed the title [BUG] AutoTuner/Bootstrapper: Enable configurations for Columnar2Row [BUG] AutoTuner/Bootstrapper: Support user-provided spark properties Feb 5, 2025
@parthosa parthosa added feature request New feature or request and removed bug Something isn't working labels Feb 5, 2025
@parthosa parthosa changed the title [BUG] AutoTuner/Bootstrapper: Support user-provided spark properties [FEA] AutoTuner/Bootstrapper: Support user-provided spark properties Feb 5, 2025
@amahussein
Copy link
Collaborator

amahussein commented Feb 5, 2025

Isn't this the typical usage of the tuner output? Users can append to it whatever they find necessary including non-rapids properties.
Users still have to wrap their own configs anyway to suit their CSP environment or submission scripts..etc.

  • Profiler's context: it makes sense that the autotuner should process all rapids configs and add comments or make recommendations.
  • Qual's context: the Spark properties are pulled from eventlog/clusterInfo. So, I am not sure where the rapids* properties are coming from? Or how this is supposed to be passed to the rapids_spark python CLI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autotuner feature request New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants