HIVE-28938: Error in LATERAL VIEW with non native tables due to prese… #5798

soumyakanti3578 · 2025-04-30T23:26:30Z

…nce of incorrect virtual columns in RowResolver

What changes were proposed in this pull request?

Refactored virtual column adding logic in CalcitePlanner::genTableLogicalPlan
Removed obtainTableType(Table tabMetaData) as the logic inside it was insufficient for classifying tables
Also removed enum TableType containing types NATIVE, DRUID, and JDBC.
Added methods isDruidTable and isJdbcTable in Table as an alternative to the enum types. isNonNative is already present in Table.

Why are the changes needed?

Explained in detail in https://issues.apache.org/jira/browse/HIVE-28938, but in short, iceberg tables were getting misclassified as NATIVE tables leading to addition of wrong virtual columns in RowResolver.

Does this PR introduce any user-facing change?

No

How was this patch tested?

mvn test -pl itests/qtest-iceberg -Piceberg -Pitests -Dtest=TestIcebergCliDriver -Dtest.output.overwrite=true -Dqfile="iceberg_multiple_lateral_views.q"

amansinha100 · 2025-05-01T07:00:05Z

Explained in detail in https://issues.apache.org/jira/browse/HIVE-28938, but in short, iceberg tables were getting misclassified as NATIVE tables leading to addition of wrong virtual columns in RowResolver.

The nature of the code change in the PR is not limited to lateral views, so I am curious is this a wider issue for Iceberg tables ? Can you test with nested subqueries in the FROM clause, nested regular views, nested materialized views ?

soumyakanti3578 · 2025-05-02T01:31:59Z

This is indeed a wider issue. All ICEBERG tables were classified as NATIVE tables in CalcitePlanner::genTableLogicalPlan, however, the side effect was only visible because the lateral view references all virtual columns too, as shown in the Jira. This is also visible when we do an explain formatted select * from <table> as shown below by running:

create external table test (a int, b int, c int, d struct<e:string, f:int>) stored by iceberg;
explain formatted select * from test;

We see NATIVE and NON NATIVE virtual columns in the output:

{"CBOPlan":"{
  \"rels\": [
    {
      \"id\": \"0\",
      \"relOp\": \"org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableScan\",
      \"table\": [
        \"default\",
        \"test\"
      ],
      \"table:alias\": \"test\",
      \"inputs\": [],
      \"rowCount\": 1.0,
      \"avgRowSize\": 248.0,
      \"rowType\": {
        \"fields\": [
          {
            \"type\": \"INTEGER\",
            \"nullable\": true,
            \"name\": \"a\"
          },
          {
            \"type\": \"INTEGER\",
            \"nullable\": true,
            \"name\": \"b\"
          },
          {
            \"type\": \"INTEGER\",
            \"nullable\": true,
            \"name\": \"c\"
          },
          {
            \"fields\": {
              \"fields\": [
                {
                  \"type\": \"VARCHAR\",
                  \"nullable\": true,
                  \"precision\": 2147483647,
                  \"name\": \"e\"
                },
                {
                  \"type\": \"INTEGER\",
                  \"nullable\": true,
                  \"name\": \"f\"
                }
              ],
              \"nullable\": true
            },
            \"nullable\": true,
            \"name\": \"d\"
          },
          {
            \"type\": \"BIGINT\",
            \"nullable\": true,
            \"name\": \"BLOCK__OFFSET__INSIDE__FILE\"
          },
          {
            \"type\": \"VARCHAR\",
            \"nullable\": true,
            \"precision\": 2147483647,
            \"name\": \"INPUT__FILE__NAME\"
          },
          {
            \"fields\": {
              \"fields\": [
                {
                  \"type\": \"BIGINT\",
                  \"nullable\": true,
                  \"name\": \"writeid\"
                },
                {
                  \"type\": \"INTEGER\",
                  \"nullable\": true,
                  \"name\": \"bucketid\"
                },
                {
                  \"type\": \"BIGINT\",
                  \"nullable\": true,
                  \"name\": \"rowid\"
                }
              ],
              \"nullable\": true
            },
            \"nullable\": true,
            \"name\": \"ROW__ID\"
          },
          {
            \"type\": \"BOOLEAN\",
            \"nullable\": true,
            \"name\": \"ROW__IS__DELETED\"
          },
          {
            \"type\": \"INTEGER\",
            \"nullable\": true,
            \"name\": \"PARTITION__SPEC__ID\"
          },
          {
            \"type\": \"BIGINT\",
            \"nullable\": true,
            \"name\": \"PARTITION__HASH\"
          },
          {
            \"type\": \"VARCHAR\",
            \"nullable\": true,
            \"precision\": 2147483647,
            \"name\": \"FILE__PATH\"
          },
          {
            \"type\": \"BIGINT\",
            \"nullable\": true,
            \"name\": \"ROW__POSITION\"
          },
          {
            \"type\": \"VARCHAR\",
            \"nullable\": true,
            \"precision\": 2147483647,
            \"name\": \"PARTITION__PROJECTION\"
          },
          {
            \"type\": \"BIGINT\",
            \"nullable\": true,
            \"name\": \"SNAPSHOT__ID\"
          }
        ],
        \"nullable\": false
      },
      \"colStats\": [
        {
          \"name\": \"a\",
          \"ndv\": 1,
          \"minValue\": -9223372036854775808,
          \"maxValue\": 9223372036854775807
        },
        {
          \"name\": \"b\",
          \"ndv\": 1,
          \"minValue\": -9223372036854775808,
          \"maxValue\": 9223372036854775807
        },
        {
          \"name\": \"c\",
          \"ndv\": 1,
          \"minValue\": -9223372036854775808,
          \"maxValue\": 9223372036854775807
        },
        {
          \"name\": \"d\",
          \"ndv\": 1
        }
      ]
    },
    {
      \"id\": \"1\",
      \"relOp\": \"org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveProject\",
      \"fields\": [
        \"a\",
        \"b\",
        \"c\",
        \"d\"
      ],
      \"exprs\": [
        {
          \"input\": 0,
          \"name\": \"$0\"
        },
        {
          \"input\": 1,
          \"name\": \"$1\"
        },
        {
          \"input\": 2,
          \"name\": \"$2\"
        },
        {
          \"input\": 3,
          \"name\": \"$3\"
        }
      ],
      \"rowCount\": 1.0
    }
  ]
  }","optimizedSQL":"SELECT `a`, `b`, `c`, `d`
FROM `default`.`test`","cboInfo":"Plan optimized by CBO.","STAGE DEPENDENCIES":{"Stage-0":{"ROOT STAGE":"TRUE"}},"STAGE PLANS":{"Stage-0":{"Fetch Operator":{"limit:":"-1","Processor Tree:":{"TableScan":{"alias:":"test","columns:":["a","b","c","d"],"database:":"default","table:":"test","isTempTable:":"false","OperatorId:":"TS_0","children":{"Select Operator":{"expressions:":"a (type: int), b (type: int), c (type: int), d (type: struct<e:string,f:int>)","columnExprMap:":{"_col0":"a","_col1":"b","_col2":"c","_col3":"d"},"outputColumnNames:":["_col0","_col1","_col2","_col3"],"OperatorId:":"SEL_1","children":{"ListSink":{"OperatorId:":"LIST_SINK_3"}}}}}}}}}}

ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java

…nce of incorrect virtual columns in RowResolver

…h !isNonNative()

zabetak

LGTM, pending tests! Left some nit comments but can be merged even without addressing those.

iceberg/iceberg-handler/src/test/queries/positive/iceberg_multiple_lateral_views.q

ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java

ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java

amansinha100 · 2025-05-09T15:40:22Z

hbase-handler/src/test/results/positive/hbase_queries.q.out

@@ -151,14 +151,14 @@ STAGE PLANS:
              predicate: UDFToDouble(key) is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
              Select Operator
-                expressions: key (type: int)
+                expressions: UDFToDouble(key) (type: double)


I am curious why HBase (and Kudu in another file) column type changed from int to double. The stats above shows 4 bytes which indicates integer.

With this PR, we don't see any virtual columns for the HBase table, whereas earlier virtual columns for native tables were getting wrongly added to the HBase tables. This affects the plan in the CBO, especially in RelFieldTrimmer.

With the PR, the plan after RelFieldTrimmer is:

2025-05-09T12:06:15,305 DEBUG [0de9da36-a1b4-4747-8aa3-b84858aed485 main] rules.RelFieldTrimmer: Plan after trimming unused fields HiveSortLimit(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC], fetch=[20]) HiveProject(key=[$1], value=[$2]) HiveJoin(condition=[=($0, $3)], joinType=[inner], algorithm=[none], cost=[not available]) HiveProject(EXPR$0=[CAST($0):DOUBLE]) HiveFilter(condition=[IS NOT NULL(CAST($0):DOUBLE)]) HiveProject(key=[$0]) HiveTableScan(table=[[default, hbase_table_1]], table:alias=[hbase_table_1]) HiveProject(key=[$0], value=[$1], EXPR$0=[CAST($0):DOUBLE]) HiveFilter(condition=[IS NOT NULL(CAST($0):DOUBLE)]) HiveProject(key=[$0], value=[$1]) HiveTableScan(table=[[default, src]], table:alias=[src])

whereas earlier the plan was:

2025-05-09T12:11:50,457 DEBUG [bd84dbdb-d563-43d7-a012-3132221196b4 main] rules.RelFieldTrimmer: Plan after trimming unused fields HiveSortLimit(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC], fetch=[20]) HiveProject(key=[$1], value=[$2]) HiveJoin(condition=[=(CAST($0):DOUBLE, CAST($1):DOUBLE)], joinType=[inner], algorithm=[none], cost=[not available]) HiveFilter(condition=[IS NOT NULL(CAST($0):DOUBLE)]) HiveProject(key=[$0]) HiveTableScan(table=[[default, hbase_table_1]], table:alias=[hbase_table_1]) HiveFilter(condition=[IS NOT NULL(CAST($0):DOUBLE)]) HiveProject(key=[$0], value=[$1]) HiveTableScan(table=[[default, src]], table:alias=[src])

We see that the CASTs are getting pushed down from the Join now.

The new plan with the pushed down cast 'key' to double on both sides of the join looks good to me, although I don't know why the extraneous virtual columns in the past would have prevented the push down since the cast is on a user column. But that's a separate topic.

sonarqubecloud · 2025-05-10T12:15:45Z

Quality Gate passed

Issues
7 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

soumyakanti3578 · 2025-05-11T01:08:33Z

@zabetak @kasakrisz @amansinha100
I have addressed all the comments and the tests are finally passing.
Please review and merge if everything looks good.

amansinha100

+1

amansinha100 · 2025-05-12T05:30:06Z

hbase-handler/src/test/results/positive/hbase_queries.q.out

@@ -151,14 +151,14 @@ STAGE PLANS:
              predicate: UDFToDouble(key) is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
              Select Operator
-                expressions: key (type: int)
+                expressions: UDFToDouble(key) (type: double)


The new plan with the pushed down cast 'key' to double on both sides of the join looks good to me, although I don't know why the extraneous virtual columns in the past would have prevented the push down since the cast is on a user column. But that's a separate topic.

zabetak · 2025-05-12T11:31:06Z

Thanks for the PR @soumyakanti3578 and for the reviews @amansinha100 and @kasakrisz !

asf-ci-hive added tests pending tests unstable and removed tests pending labels Apr 30, 2025

asf-ci-hive added tests pending tests unstable and removed tests unstable tests pending labels May 2, 2025

kasakrisz reviewed May 6, 2025

View reviewed changes

ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java Outdated Show resolved Hide resolved

soumyakanti3578 added 4 commits May 7, 2025 14:18

HIVE-28938: Error in LATERAL VIEW with non native tables due to prese…

05b1e4c

…nce of incorrect virtual columns in RowResolver

Added isNative() method so that we can stop using double negation wit…

0fd007c

…h !isNonNative()

Fix test failures

3f99aa0

revert some changes

37d8ab1

soumyakanti3578 force-pushed the HIVE-28938 branch from 43aa1bf to 37d8ab1 Compare May 7, 2025 21:27

asf-ci-hive added tests pending and removed tests unstable labels May 7, 2025

small refactor

cd5abdd

asf-ci-hive added tests failed tests pending and removed tests pending tests failed labels May 7, 2025

asf-ci-hive added tests unstable tests pending and removed tests pending tests unstable labels May 8, 2025

zabetak approved these changes May 9, 2025

View reviewed changes

kasakrisz approved these changes May 9, 2025

View reviewed changes

amansinha100 reviewed May 9, 2025

View reviewed changes

Address nits, add a test for explain formatted

b7465da

asf-ci-hive added tests pending tests failed and removed tests unstable tests pending tests failed labels May 9, 2025

asf-ci-hive added tests passed and removed tests pending labels May 10, 2025

amansinha100 approved these changes May 12, 2025

View reviewed changes

zabetak approved these changes May 12, 2025

View reviewed changes

zabetak merged commit 0e287d4 into apache:master May 12, 2025
4 checks passed

This was referenced May 15, 2025

HIVE-28917: NPE in merge statement when checking nullability of joining column in matched clause - ADDENDUM #5815

Merged

HIVE-28917: NPE in merge statement when checking nullability of joini… #5781

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HIVE-28938: Error in LATERAL VIEW with non native tables due to prese… #5798

HIVE-28938: Error in LATERAL VIEW with non native tables due to prese… #5798

Uh oh!

soumyakanti3578 commented Apr 30, 2025 •

edited

Loading

Uh oh!

amansinha100 commented May 1, 2025

Uh oh!

soumyakanti3578 commented May 2, 2025

Uh oh!

Uh oh!

zabetak left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amansinha100 May 9, 2025

Uh oh!

soumyakanti3578 May 9, 2025

Uh oh!

amansinha100 May 12, 2025

Uh oh!

sonarqubecloud bot commented May 10, 2025

Uh oh!

soumyakanti3578 commented May 11, 2025

Uh oh!

amansinha100 left a comment

Uh oh!

amansinha100 May 12, 2025

Uh oh!

Uh oh!

zabetak commented May 12, 2025

Uh oh!

Uh oh!

HIVE-28938: Error in LATERAL VIEW with non native tables due to prese… #5798

HIVE-28938: Error in LATERAL VIEW with non native tables due to prese… #5798

Uh oh!

Conversation

soumyakanti3578 commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

amansinha100 commented May 1, 2025

Uh oh!

soumyakanti3578 commented May 2, 2025

Uh oh!

Uh oh!

zabetak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amansinha100 May 9, 2025

Choose a reason for hiding this comment

Uh oh!

soumyakanti3578 May 9, 2025

Choose a reason for hiding this comment

Uh oh!

amansinha100 May 12, 2025

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented May 10, 2025

Quality Gate passed

Uh oh!

soumyakanti3578 commented May 11, 2025

Uh oh!

amansinha100 left a comment

Choose a reason for hiding this comment

Uh oh!

amansinha100 May 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zabetak commented May 12, 2025

Uh oh!

Uh oh!

soumyakanti3578 commented Apr 30, 2025 •

edited

Loading