-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-54971] Add WITH SCHEMA EVOLUTION syntax for SQL INSERT #53732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
JIRA Issue Information=== Improvement SPARK-54971 === This comment was automatically generated by GitHub Actions |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Show resolved
Hide resolved
|
I was thinking it can be interesting to have Spark optionally call alterTable , if the V2 data source has TableCapability.AUTOMATIC_SCHEMA_EVOLUTION (which we introduced when doing MERGE INTO schema evolution implementation in DSV2). That will ease the burden on the data sources. But it can be a future enhancement. |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala
Show resolved
Hide resolved
| } | ||
| } | ||
|
|
||
| test("SPARK-54971: INSERT WITH SCHEMA EVOLUTION is currently unsupported") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To cover the first
case InsertIntoStatement(l @ LogicalRelationWithTable(_: InsertableRelation, _),
parts, _, query, overwrite, false, _) if parts.isEmpty =>
parts, _, query, overwrite, false, _, withSchemaEvolution)
if parts.isEmpty && !withSchemaEvolution =>
| } | ||
| } | ||
|
|
||
| testPartitionedTable("SPARK-54971: INSERT WITH SCHEMA EVOLUTION is currently unsupported") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To cover the 2nd case
case i @ InsertIntoStatement(l @ LogicalRelationWithTable(t: HadoopFsRelation, table),
parts, _, query, overwrite, _, _, withSchemaEvolution)
if query.resolved && !withSchemaEvolution =>
What changes were proposed in this pull request?
Similar to the MERGE WITH SCHEMA EVOLUTION PR, this PR introduces a syntax
WITH SCHEMA EVOLUTIONto the SQLINSERTcommand. Since this syntax is not fully implemented for any table formats yet, users will receive an exception if they try to use it.When
WITH SCHEMA EVOLUTIONis specified, schema evolution-related features must be turned on for this single statement and only in this statement.In this PR, Spark is only responsible for recognizing the existence or absence of the syntax WITH SCHEMA EVOLUTION, and the recognition info is passed down from the
Analyzer. WhenWITH SCHEMA EVOLUTIONis detected, Spark sets themergeSchemawrite option totruein the respective V2 Insert Command nodes.Data sources must respect the syntax and give appropriate reactions: Turn on features that are categorised as "schema evolution" when the
WITH SCHEMA EVOLUTIONSyntax exists.Why are the changes needed?
This intuitive SQL Syntax allows the user to specify Automatic Schema Evolution for a specific
INSERToperation.Some users would like Schema Evolution for DML commands like
MERGE,INSERT,... where the schema between the table and query relations can mismatch.Does this PR introduce any user-facing change?
Yes, Introducing the SQL Syntax
WITH SCHEMA EVOLUTIONto SQLINSERT.How was this patch tested?
Added UTs.
Was this patch authored or co-authored using generative AI tooling?
No.