Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Calcite] Showcase bug related to JOIN expression #478

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

findinpath
Copy link
Contributor

@findinpath findinpath commented Nov 15, 2023

Consider the following scenario to translate via Coral:

CREATE TABLE test.table_join_view_scenario_simple_table (key string);
CREATE TABLE test.table_join_view_scenario_table_nested_columns (
      studentid struct<idtype:string,id:string>, 
      details struct<lastupdated:timestamp,name:string, misc:struct<enrollmentdate:date,major:string>>);

CREATE VIEW test.view_table_join_view_scenario_table_nested_columns AS 
SELECT studentid.id student_id, details.name as student_name, details.misc.major major, details.lastUpdated details_last_updated 
FROM test.table_join_view_scenario_table_nested_columns

CREATE VIEW test.view_table_join_with_view_table_with_nested_columns AS 
SELECT v.student_id, v.student_name, v.details_last_updated
FROM test.view_table_join_view_scenario_table_nested_columns v 
INNER JOIN test.table_join_view_scenario_simple_table t 
     ON concat('U', v.student_id) = t.key;

When we have an function call in the JOIN expression like in the example provided:

concat('U', v.student_id) = t.key

we are then having a new LogicalProject (different from the leaf LogicalProject corresponding to the view view_table_join_view_scenario_table_nested_columns see org.apache.calcite.sql2rel.SqlToRelConverter#leaves) which includes the expression CAST(concat('U', $0.id)):VARCHAR(2147483647) RexCall .
This project is not considered a leaf.

In the flattening algorithm from org.apache.calcite.sql2rel.SqlToRelConverter.Blackboard#flatten therefore the LogicalProject with the expressions

0 = {RexFieldAccess@8734} "$0.id"
1 = {RexFieldAccess@8735} "$1.name"
2 = {RexFieldAccess@8736} "$1.misc.major"
3 = {RexFieldAccess@8737} "$1.lastupdated"
4 = {RexCall@8738} "CAST(concat('U', $0.id)):VARCHAR(2147483647)"

will be flattened therefore again (because it is not a leaf) and we obtain therefore the LogicalTableScan corresponding to the table table_join_view_scenario_table_nested_columns which contains only two fields:

"studentid" -> {RelRecordType@8755} "RecordType:peek_no_expand(VARCHAR(2147483647) idtype, VARCHAR(2147483647) id)"
"details" -> {RelRecordType@8757} "RecordType:peek_no_expand(TIMESTAMP lastupdated, VARCHAR(2147483647) name, RecordType:peek_no_expand(DATE enrollmentdate, VARCHAR(2147483647) major) misc)"

Therefore the type corresponding to the LogicalTableScan having only two fields is being returned for usage in SqlToRelConverter.convertIdentifier.

However, details_last_updated has the index 3 in the LogicalProject

0 = {RexFieldAccess@8734} "$0.id"
1 = {RexFieldAccess@8735} "$1.name"
2 = {RexFieldAccess@8736} "$1.misc.major"
3 = {RexFieldAccess@8737} "$1.lastupdated"

This is why we stumble in an AssertionError in RexBuilder.makeFieldAccess :

java.lang.AssertionError: Field ordinal 3 is invalid for  type 

org.apache.calcite.sql2rel.SqlToRelConverter.Blackboard#flatten source code:

    public void flatten(
        List<RelNode> rels,
        int systemFieldCount,
        int[] start,
        List<Pair<RelNode, Integer>> relOffsetList) {
      for (RelNode rel : rels) {
        if (leaves.contains(rel) || rel instanceof LogicalMatch) {
          relOffsetList.add(
              Pair.of(rel, start[0]));
          start[0] += rel.getRowType().getFieldCount();
        } else {
          if (rel instanceof LogicalJoin
              || rel instanceof LogicalAggregate) {
            start[0] += systemFieldCount;
          }
          flatten(
              rel.getInputs(),
              systemFieldCount,
              start,
              relOffsetList);
        }
      }
    }

This PR serves only for showcasing a potential unidentified bug in calcite-core

Stacktrace:

java.lang.AssertionError: Field ordinal 3 is invalid for  type 'RecordType:peek_no_expand(RecordType:peek_no_expand(VARCHAR(2147483647) idtype, VARCHAR(2147483647) id) studentid, RecordType:peek_no_expand(TIMESTAMP lastupdated, VARCHAR(2147483647) name, RecordType:peek_no_expand(DATE enrollmentdate, VARCHAR(2147483647) major) misc) details)'
	at org.apache.calcite.rex.RexBuilder.makeFieldAccess(RexBuilder.java:197)
	at org.apache.calcite.sql2rel.SqlToRelConverter.convertIdentifier(SqlToRelConverter.java:3725)
	at org.apache.calcite.sql2rel.SqlToRelConverter.access$2200(SqlToRelConverter.java:217)
	at org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4796)
	at org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4092)
	at org.apache.calcite.sql.SqlIdentifier.accept(SqlIdentifier.java:317)
	at org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:4656)
	at org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectList(SqlToRelConverter.java:3939)
	at org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:670)
	at org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:627)
	at org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3181)
	at com.linkedin.coral.hive.hive2rel.HiveSqlToRelConverter.convertQuery(HiveSqlToRelConverter.java:63)
	at com.linkedin.coral.common.ToRelConverter.toRel(ToRelConverter.java:157)
	at com.linkedin.coral.common.ToRelConverter.convertView(ToRelConverter.java:124)
	at com.linkedin.coral.trino.rel2trino.HiveToTrinoConverterTest.testViews(HiveToTrinoConverterTest.java:44)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:80)
	at org.testng.internal.Invoker.invokeMethod(Invoker.java:701)
	at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:893)
	at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1218)
	at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:127)
	at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:111)
	at org.testng.TestRunner.privateRun(TestRunner.java:758)
	at org.testng.TestRunner.run(TestRunner.java:613)
	at org.testng.SuiteRunner.runTest(SuiteRunner.java:334)
	at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:329)
	at org.testng.SuiteRunner.privateRun(SuiteRunner.java:291)
	at org.testng.SuiteRunner.run(SuiteRunner.java:240)

What changes are proposed in this pull request, and why are they necessary?

How was this patch tested?

@@ -49,7 +49,7 @@ public void testViews(String database, String view, String expectedSql) {

@DataProvider(name = "viewTestCases")
public Object[][] viewTestCasesProvider() {
return new Object[][] {
return new Object[][] { { "test", "view_table_join_with_view_table_with_nested_columns", "..." },
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AssertionError happens in the translation of the view currently without reaching the point where the view translation gets compared with ....

Stacktrace:

java.lang.AssertionError: Field ordinal 3 is invalid for  type 'RecordType:peek_no_expand(RecordType:peek_no_expand(VARCHAR(2147483647) idtype, VARCHAR(2147483647) id) studentid, RecordType:peek_no_expand(TIMESTAMP lastupdated, VARCHAR(2147483647) name, RecordType:peek_no_expand(DATE enrollmentdate, VARCHAR(2147483647) major) misc) details)'
	at org.apache.calcite.rex.RexBuilder.makeFieldAccess(RexBuilder.java:197)
	at org.apache.calcite.sql2rel.SqlToRelConverter.convertIdentifier(SqlToRelConverter.java:3725)
	at org.apache.calcite.sql2rel.SqlToRelConverter.access$2200(SqlToRelConverter.java:217)
	at org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4796)
	at org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4092)
	at org.apache.calcite.sql.SqlIdentifier.accept(SqlIdentifier.java:317)
	at org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:4656)
	at org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectList(SqlToRelConverter.java:3939)
	at org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:670)
	at org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:627)
	at org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3181)
	at com.linkedin.coral.hive.hive2rel.HiveSqlToRelConverter.convertQuery(HiveSqlToRelConverter.java:63)
	at com.linkedin.coral.common.ToRelConverter.toRel(ToRelConverter.java:157)
	at com.linkedin.coral.common.ToRelConverter.convertView(ToRelConverter.java:124)
	at com.linkedin.coral.trino.rel2trino.HiveToTrinoConverterTest.testViews(HiveToTrinoConverterTest.java:44)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:80)
	at org.testng.internal.Invoker.invokeMethod(Invoker.java:701)
	at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:893)
	at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1218)
	at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:127)
	at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:111)
	at org.testng.TestRunner.privateRun(TestRunner.java:758)
	at org.testng.TestRunner.run(TestRunner.java:613)
	at org.testng.SuiteRunner.runTest(SuiteRunner.java:334)
	at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:329)
	at org.testng.SuiteRunner.privateRun(SuiteRunner.java:291)
	at org.testng.SuiteRunner.run(SuiteRunner.java:240)

@wmoustafa
Copy link
Contributor

Thanks for the PR. Could you simplify the table/view names in the test case? Also update the description to highlight the key reason why the test case fails (you could use the simplified names from the test). Right now, it is a bit hard to reverse engineer the problem from the test code alone.

@findinpath
Copy link
Contributor Author

@wmoustafa AC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants