-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DC-760: dataset builder query generation #1516
Conversation
… places; move validation to construtors;
} | ||
|
||
@Override | ||
public String renderSQL(SqlPlatform platform) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is platform not used here? Should that get passed through somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be needed in the future, but wasn't needed yet. Everything this calls is private so if we were to use the sql platform in the future it wouldn't affect this class's contract.
I'm not totally wild about the way I added support for azure vs bq here since it involved threading the sql platform through all the code but it seemed like the best way at the time, and does work.
|
||
@Override | ||
public String renderSQL(SqlPlatform platform) { | ||
// TODO: use named parameters for literals to protect against SQL injection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like something we should make sure gets done before this code ends up in production and is callable by any non-admin. Is there a ticket for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but I can create it. I'm not an expert in the way the TDR currently generates SQL code but I don't think named templates are used in all cases.
} | ||
|
||
// render the primary TableVariable | ||
String sql = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this have a todo to use PreparedStatements or something else safer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how that would work, but I agree, I'd much rather use the prepared statement API if possible. I think you need a SQL connection to create one, so it would be hard to do in the current flow; that would require passing in some state in addition to platform
.
src/test/java/bio/terra/tanagra/query/ColumnHeaderSchemaTest.java
Outdated
Show resolved
Hide resolved
.build(), | ||
tableVariable, | ||
"alias"); | ||
assertThat(fieldVariableSqlFunctionWrapper.renderSQL(null), is("custom(t.field)")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it would be worth breaking this test up and creating individual tests for each assertion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I was planning to revisit this once I had more tests written; the way the code sets up FieldPointer
and TablePointer
is pretty verbose right now. In the original code these objects are only constructed using serialization so no much effort was given to manual construction.
/** Enum for the types of external data pointers supported by Tanagra. */ | ||
public enum Type { | ||
BQ_DATASET, | ||
AZURE_DATASET |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to specify this as a Synapse query (i.e. SYNAPSE_DATASET), as the counterpart to BQ.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the tanagra code there's a concept of "query executor" which handles issuing the queries to the database, which is what this defines. I added a concept of "sql type" to handle differences between bigquery and T-SQL. That was OK for the proof-of-concept, for the new code we will use the TDR dataset ID to figure out how to run the query, so this concept can go away.
And with your suggestion to change the way table names are handled, it may be possible to remove the sql type concept too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking more at the code, I would like to suggest a light integration with ANTLR. The idea would be to (1) Build a general SQL query using the code in this PR, but ignoring differences between BQ & TSQL and ignoring small details like aliases, and then (2) Parse the generated query using ether the BQVisitor or SynapseVisitor. I think this has a few advantages:
- Simplify the code already in this PR - We have to somehow generate a base query, but let's make it as straightforward as possible.
- We can handle all language specific changes in one spot (SynapseVisitor or BQVisitor)
- ANTLR will handle the complexities of queries from parquet files out of the box -- We already need to support updating a query to select from a set of parquet files in the SynapseVisitor.
- Inherently by using the parser, the generated SQL will be checked to ensure it is valid.
I'm still of two minds about this.. If the output of this code is passed to the parser, then the data flow looks like:
It seems wasteful to generate SQL, then parse it, only to generate SQL again. To avoid this, the logic in |
* Add test for more complicated sql * Move test to OnDemand suite * Add back to query test and split up assert * Update src/test/java/bio/terra/service/snapshotbuilder/query/QueryTest.java Co-authored-by: Phil Shapiro <[email protected]> * Move Boolean to boolean * Fix syntax --------- Co-authored-by: Phil Shapiro <[email protected]>
@snf2ye Can you take another look at this? I've simplified things even further, although I haven't addressed all the comments above. The remaining issues I can see are
I'll make two tickets for this work. When it's done, it should work the same way that |
Kudos, SonarCloud Quality Gate passed! |
Port over some code from tanagra that handles generation SQL queries from java objects. This will be used in a future PR that handles APIs requests from the dataset builder to generate SQL based on a set of criteria in the cohort builder UI.
The code supports T-sql and big query SQL, and provides a
Query
API that supports:This doesn't include the code to execute these queries in TDR, although this work was already done as part of a proof-of-concept, see #1433
Since the code isn't used yet, I'm aiming for 80%+ code coverage in tests.