Skip to content

Conversation

srielau
Copy link
Contributor

@srielau srielau commented Oct 16, 2025

What changes were proposed in this pull request?

As part of generalizing the usage of parameter markers, we want to also generalize the ability to chain string literals in more places than only expressions.
This allows for fuctionality such as: spark.sql("CREATE TABLE ... LOCATION :root '/subdir'", Map("root" -> "/data")
to turn into:
CREATE TABLE ... LOCATION '/data/subdir'

String coalescing works now nearly everywhere with two cave-ats:

  • typed-literals
    TIMESTAMP :param collides with json path jsonstr:path.
    I think tat's acceptable as most typed literals take strings and allow :: or CAST casting. And of course one can always pass a typed parameter marker
  • property lists allow equal signs to be optional. That means we don't know where the split key and value. OPTIONS however already allowed constants which included coalescing. So precedence clarifies that, in teh absesne of an equal sign only the first string literal is the key. All subsequent literals compose the value.

Why are the changes needed?

Greatly increases the power of parameter markers without having to build heavy expressions and refactoring code

Does this PR introduce any user-facing change?

Yes, it is an additive feature

How was this patch tested?

Written a new set of tests StringLiteralCoalescingSuite

Was this patch authored or co-authored using generative AI tooling?

Claude Sonnet 4.5

@github-actions github-actions bot added the SQL label Oct 16, 2025
Copy link
Contributor

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, it will make the SQL syntax better!

@github-actions github-actions bot removed the DOCS label Oct 18, 2025
Copy link
Contributor

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this and making our SQL language better!


// Check if any of the tokens are R-strings.
val hasRString = tokens.exists { token =>
val text = token.getText
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks duplicated with L187-190 and L207-210 below, let's dedup into a helper? Bonus point for adding a bunch of unit tests specifically for the helper with many different input strings (AI can maybe help generate a bunch of test cases).

case p: ExpressionPropertyWithKeyNoEqualsContext =>
val k = visitPropertyKeyOrStringLitNoCoalesce(p.key)
val v = Option(p.value).map(expression).getOrElse {
operationNotAllowed(s"A value must be specified for the key: $k.", ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks duplicated with the above, please dedup?

// ========================================================================

test("string coalescing in LIKE pattern") {
checkAnswer(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many tests of this format, with

test("category") {
  checkAnswer(
    sql("some query"),
    Row(true)
  )
}

It seems like we could simplify this a bunch by just putting a big Seq of strings and then doing a .foreach on it and calling checkAnswer(sql(query), Row(true)) on it. We can still port the existing test case names as comments and put them in comments in the Seq instead.

withTable("test_table_123", "test_table_456", "other_table") {
sql("CREATE TABLE test_table_123 (id INT)")
sql("CREATE TABLE test_table_456 (id INT)")
sql("CREATE TABLE other_table (id INT)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do a Seq of the table names and .foreach on it?


// The pattern is coalesced into 'test_table_*' (regex pattern where * matches any chars)
val result = sql("SHOW TABLES LIKE 'test' '_table_' '*'").collect()
assert(result.length == 2,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we checkAnswer on it?

expressionProperty
: key=propertyKey (EQ? value=expression)?
: key=propertyKeyOrStringLit EQ value=expression #expressionPropertyWithKeyAndEquals
| key=propertyKeyOrStringLitNoCoalesce value=expression #expressionPropertyWithKeyNoEquals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| key=propertyKeyOrStringLitNoCoalesce value=expression #expressionPropertyWithKeyNoEquals
| key=propertyKeyOrStringLitNoCoalesce value=expression? #expressionPropertyWithKeyNoEquals

tokens.head
} else {
// Multiple tokens: create coalesced token
createCoalescedStringToken(tokens)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm shall we put everything in AstBuilder? Looks weird to combine tokens here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants