-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-53573][SQL] Allow coalescing string literals everywhere #52638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this, it will make the SQL syntax better!
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Show resolved
Hide resolved
sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala
Outdated
Show resolved
Hide resolved
e2aa35b
to
e7a726b
Compare
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
Outdated
Show resolved
Hide resolved
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
Outdated
Show resolved
Hide resolved
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
Outdated
Show resolved
Hide resolved
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
Outdated
Show resolved
Hide resolved
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
Outdated
Show resolved
Hide resolved
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
Outdated
Show resolved
Hide resolved
5fc09c5
to
58f5cc9
Compare
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
Outdated
Show resolved
Hide resolved
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
Outdated
Show resolved
Hide resolved
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this and making our SQL language better!
|
||
// Check if any of the tokens are R-strings. | ||
val hasRString = tokens.exists { token => | ||
val text = token.getText |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks duplicated with L187-190 and L207-210 below, let's dedup into a helper? Bonus point for adding a bunch of unit tests specifically for the helper with many different input strings (AI can maybe help generate a bunch of test cases).
case p: ExpressionPropertyWithKeyNoEqualsContext => | ||
val k = visitPropertyKeyOrStringLitNoCoalesce(p.key) | ||
val v = Option(p.value).map(expression).getOrElse { | ||
operationNotAllowed(s"A value must be specified for the key: $k.", ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks duplicated with the above, please dedup?
// ======================================================================== | ||
|
||
test("string coalescing in LIKE pattern") { | ||
checkAnswer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are many tests of this format, with
test("category") {
checkAnswer(
sql("some query"),
Row(true)
)
}
It seems like we could simplify this a bunch by just putting a big Seq
of strings and then doing a .foreach
on it and calling checkAnswer(sql(query), Row(true))
on it. We can still port the existing test case names as comments and put them in comments in the Seq
instead.
withTable("test_table_123", "test_table_456", "other_table") { | ||
sql("CREATE TABLE test_table_123 (id INT)") | ||
sql("CREATE TABLE test_table_456 (id INT)") | ||
sql("CREATE TABLE other_table (id INT)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do a Seq of the table names and .foreach on it?
|
||
// The pattern is coalesced into 'test_table_*' (regex pattern where * matches any chars) | ||
val result = sql("SHOW TABLES LIKE 'test' '_table_' '*'").collect() | ||
assert(result.length == 2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we checkAnswer
on it?
expressionProperty | ||
: key=propertyKey (EQ? value=expression)? | ||
: key=propertyKeyOrStringLit EQ value=expression #expressionPropertyWithKeyAndEquals | ||
| key=propertyKeyOrStringLitNoCoalesce value=expression #expressionPropertyWithKeyNoEquals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| key=propertyKeyOrStringLitNoCoalesce value=expression #expressionPropertyWithKeyNoEquals | |
| key=propertyKeyOrStringLitNoCoalesce value=expression? #expressionPropertyWithKeyNoEquals |
tokens.head | ||
} else { | ||
// Multiple tokens: create coalesced token | ||
createCoalescedStringToken(tokens) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm shall we put everything in AstBuilder
? Looks weird to combine tokens here.
What changes were proposed in this pull request?
As part of generalizing the usage of parameter markers, we want to also generalize the ability to chain string literals in more places than only expressions.
This allows for fuctionality such as: spark.sql("CREATE TABLE ... LOCATION :root '/subdir'", Map("root" -> "/data")
to turn into:
CREATE TABLE ... LOCATION '/data/subdir'
String coalescing works now nearly everywhere with two cave-ats:
TIMESTAMP :param collides with json path jsonstr:path.
I think tat's acceptable as most typed literals take strings and allow :: or CAST casting. And of course one can always pass a typed parameter marker
Why are the changes needed?
Greatly increases the power of parameter markers without having to build heavy expressions and refactoring code
Does this PR introduce any user-facing change?
Yes, it is an additive feature
How was this patch tested?
Written a new set of tests StringLiteralCoalescingSuite
Was this patch authored or co-authored using generative AI tooling?
Claude Sonnet 4.5