-
Notifications
You must be signed in to change notification settings - Fork 1k
Feat(duckdb)!: support transpilation of SHA256 from bigquery to duckdb #6421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
sqlglot/dialects/dialect.py
Outdated
| def build_sha2_digest_sql(self: Generator, expression: exp.SHA2Digest) -> str: | ||
| return self.func(f"SHA{expression.text('length') or '256'}", expression.this) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename this to sha2_digest_sql, otherwise naming clashes with AST helper builders.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
| exp.SHA2Digest: lambda self, e: self.func( | ||
| "SHA2", e.this, e.args.get("length") or exp.Literal.number(256) | ||
| ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a bug in how exp.SHA2 is generated in redshift? Why did we need this entry and not reuse the new sha2_digest_sql helper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so in redshift it should be something like SHA2('abc', 256), while sha2_digest_sql produces SHA256('abc)
| exp.SHA2Digest: lambda self, e: self.func( | ||
| "SHA2", e.this, e.args.get("length") or exp.Literal.number(256) | ||
| ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto, similar to Redshift.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
like redshift, spark2 also only has SHA2 function
When parsing SHA256 from BigQuery, we want to represent it as SHA2Digest with length 256 since the output is binary, just like we do with SHA1.
We also want to wrap SHA256 with UNHEX in the transpiled DuckDB query so that the output is binary to stay consistent with BigQuery.
For other dialects, we just want to make sure SHA2Digest with length 256 generates the appropriate queries. For example, SHA256(x) in bigquery should be tranpiled into SHA2_BINARY(x, 256) in Snowflake