-
Notifications
You must be signed in to change notification settings - Fork 4.8k
HIVE-28938: Error in LATERAL VIEW with non native tables due to prese… #5798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The nature of the code change in the PR is not limited to lateral views, so I am curious is this a wider issue for Iceberg tables ? Can you test with nested subqueries in the FROM clause, nested regular views, nested materialized views ? |
This is indeed a wider issue. All ICEBERG tables were classified as NATIVE tables in
We see NATIVE and NON NATIVE virtual columns in the output:
|
…nce of incorrect virtual columns in RowResolver
43aa1bf
to
37d8ab1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, pending tests! Left some nit comments but can be merged even without addressing those.
iceberg/iceberg-handler/src/test/queries/positive/iceberg_multiple_lateral_views.q
Outdated
Show resolved
Hide resolved
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
Outdated
Show resolved
Hide resolved
@@ -151,14 +151,14 @@ STAGE PLANS: | |||
predicate: UDFToDouble(key) is not null (type: boolean) | |||
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE | |||
Select Operator | |||
expressions: key (type: int) | |||
expressions: UDFToDouble(key) (type: double) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious why HBase (and Kudu in another file) column type changed from int to double. The stats above shows 4 bytes which indicates integer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this PR, we don't see any virtual columns for the HBase table, whereas earlier virtual columns for native tables were getting wrongly added to the HBase tables. This affects the plan in the CBO, especially in RelFieldTrimmer.
With the PR, the plan after RelFieldTrimmer is:
2025-05-09T12:06:15,305 DEBUG [0de9da36-a1b4-4747-8aa3-b84858aed485 main] rules.RelFieldTrimmer: Plan after trimming unused fields
HiveSortLimit(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC], fetch=[20])
HiveProject(key=[$1], value=[$2])
HiveJoin(condition=[=($0, $3)], joinType=[inner], algorithm=[none], cost=[not available])
HiveProject(EXPR$0=[CAST($0):DOUBLE])
HiveFilter(condition=[IS NOT NULL(CAST($0):DOUBLE)])
HiveProject(key=[$0])
HiveTableScan(table=[[default, hbase_table_1]], table:alias=[hbase_table_1])
HiveProject(key=[$0], value=[$1], EXPR$0=[CAST($0):DOUBLE])
HiveFilter(condition=[IS NOT NULL(CAST($0):DOUBLE)])
HiveProject(key=[$0], value=[$1])
HiveTableScan(table=[[default, src]], table:alias=[src])
whereas earlier the plan was:
2025-05-09T12:11:50,457 DEBUG [bd84dbdb-d563-43d7-a012-3132221196b4 main] rules.RelFieldTrimmer: Plan after trimming unused fields
HiveSortLimit(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC], fetch=[20])
HiveProject(key=[$1], value=[$2])
HiveJoin(condition=[=(CAST($0):DOUBLE, CAST($1):DOUBLE)], joinType=[inner], algorithm=[none], cost=[not available])
HiveFilter(condition=[IS NOT NULL(CAST($0):DOUBLE)])
HiveProject(key=[$0])
HiveTableScan(table=[[default, hbase_table_1]], table:alias=[hbase_table_1])
HiveFilter(condition=[IS NOT NULL(CAST($0):DOUBLE)])
HiveProject(key=[$0], value=[$1])
HiveTableScan(table=[[default, src]], table:alias=[src])
We see that the CAST
s are getting pushed down from the Join
now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new plan with the pushed down cast 'key' to double on both sides of the join looks good to me, although I don't know why the extraneous virtual columns in the past would have prevented the push down since the cast is on a user column. But that's a separate topic.
|
@zabetak @kasakrisz @amansinha100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@@ -151,14 +151,14 @@ STAGE PLANS: | |||
predicate: UDFToDouble(key) is not null (type: boolean) | |||
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE | |||
Select Operator | |||
expressions: key (type: int) | |||
expressions: UDFToDouble(key) (type: double) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new plan with the pushed down cast 'key' to double on both sides of the join looks good to me, although I don't know why the extraneous virtual columns in the past would have prevented the push down since the cast is on a user column. But that's a separate topic.
Thanks for the PR @soumyakanti3578 and for the reviews @amansinha100 and @kasakrisz ! |
…nce of incorrect virtual columns in RowResolver
What changes were proposed in this pull request?
CalcitePlanner::genTableLogicalPlan
obtainTableType(Table tabMetaData)
as the logic inside it was insufficient for classifying tablesenum TableType
containing typesNATIVE
,DRUID
, andJDBC
.isDruidTable
andisJdbcTable
inTable
as an alternative to the enum types.isNonNative
is already present inTable
.Why are the changes needed?
Explained in detail in https://issues.apache.org/jira/browse/HIVE-28938, but in short, iceberg tables were getting misclassified as NATIVE tables leading to addition of wrong virtual columns in RowResolver.
Does this PR introduce any user-facing change?
No
How was this patch tested?