-
Notifications
You must be signed in to change notification settings - Fork 117
Fix struct type metadata preservation in query results and continuations #3753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix struct type metadata preservation in query results and continuations #3753
Conversation
This commit implements a solution for GitHub issue FoundationDB#3743 where struct type metadata (type names like "STRUCT_1", "STRUCT_2") was getting lost during query execution, especially in continuations. **Problem:** When executing queries that return struct types, ResultSetMetaData would return UUID-based names (like "id...") instead of proper struct type names. This affected: - Star expansion queries (SELECT * FROM table) - Nested star queries (SELECT (*) FROM table) - Direct struct projections (SELECT struct_column FROM table) - Query continuations (EXECUTE CONTINUATION) **Root Cause:** 1. Semantic analysis produces correct DataTypes with struct names ("STRUCT_1", etc.) 2. Cascades planner Type.Record loses struct names during optimization (becomes null) 3. executePhysicalPlan() previously relied only on planner types → UUID generation 4. Continuations had no semantic type info → always generated UUIDs **Solution - Hybrid Approach:** Merge semantic type structure with planner field names: - Field names from planner Type.Record (handles aliases, star expansion, "_0" naming) - Type structure from semantic DataTypes (preserves "STRUCT_1", "STRUCT_2") - Additional enrichment from RecordMetaData descriptors for nested types
Refactored canReadStructTypeName helper to use integer parameters for controlling base query and continuation reruns. Added tests to cover the withExecutionContext method in ContinuedPhysicalQueryPlan, addressing the Teamscale test gap.
…UATION` syntax is self-contained.
…fdb-record-layer into struct_type_metadata_fix
| * @return List of DataTypes preserving struct type names (field names are placeholders) | ||
| */ | ||
| @Nonnull | ||
| private static List<DataType> captureSemanticTypeStructure( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expressions is a comprehension that captures operations on an ordered list of Expressions, please move the method and make it part of it, this enables for example, maintaining and caching the underlying data types. Something like:
Iterable<DataType> getDataTypes() {
return ....
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, you can wrap the data types of the individual Expression item(s) of Expressions with a single StructType that is structurally equivalent to the output of the RelationalExpression's (and corresponding physical RecordQuery* operator's) RecordType.
This I think will streamline things quite nicely down the line, especially when you want to use that later on to parse the resultset metadata (please see my commet there).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh, this would be much nicer there, thanks for pointing this out!
| final List<DataType> semanticFieldTypes; | ||
| if (resultType instanceof Type.Record) { | ||
| final Type.Record recordType = (Type.Record) resultType; | ||
| semanticFieldTypes = recordType.getFields().stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the underlying Cascades type doesn’t preserve internal struct names, then the semantic field types based on them will also lose this information, right?
If this is the case, we might want to include the semantic information in the continuation as well, so that their (re)construction is completely independent of the underlying operator typing system.
Having said that, it seems like a good idea to associate the semantic information in the continuation itself, or put differently, integrate it into the state of the physical plan, which, albeit being useful for state management, could assist streamlining query processing in a world where separate but cooperative runtimes exist for plan generation, optimization, and execution exist for example.
| * @throws RelationalException if type structures don't match | ||
| */ | ||
| @Nonnull | ||
| public static DataType.StructType mergeSemanticTypesWithPlannerNames( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed? it seems to me what we need is to perform the following assertion:
Assert.thatUnchecked(plannerType.isStrurtucallyEquivalentTo(DataTypeUtils.toRecordLayerType(semanticType))There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer needed, and was replaced by something that takes semantic information and combines it with RecordMetaData information for nested structs.
| final var currentPlanHashMode = OptionsUtils.getCurrentPlanHashMode(options); | ||
| final var dataType = (DataType.StructType) DataTypeUtils.toRelationalType(type); | ||
|
|
||
| final DataType.StructType resultDataType = TypeMetadataEnricher.mergeSemanticTypesWithPlannerNames(type, semanticFieldTypes, fdbRecordStore.getRecordMetaData()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semanticFieldTypes (or more precisely, the StructType made of these) must be structurally equivalent to the resulting Type.Record of the top-level physical operator of the plan. Therefore, I don't think you need to do any merging, perhaps we just want to validate the structural equivalence, and then only use the StructType created with the semanticFieldTypes to construct the result set metadata.
Should we need the underlying Type.Record for some reason, if I am not mistaken, DataTypeUtils.toRecordType can be used to construct the Type.Record with the nested field names correctly. (if this is not the case, we can definitely fix it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, this is no longer using Type.Record, but is now coming from semanticStructType
Good suggestion 👍
Move field name capture to semantic analysis phase and simplify type merging to only enrich nested structs from RecordMetaData.
| * type metadata for result sets. | ||
| */ | ||
| @API(API.Status.EXPERIMENTAL) | ||
| public final class TypeMetadataEnricher { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this class? The DataType.StructType already captures all of the named-fields/named-record types correctly, so when a physical plan is received, the Type.Record object of the top-level operator does not provide any extra information that is needed. And since SQL dictates that the final result given to user must match the constructed result set as imperatively defined in the query, both objects must align structurally (the one coming from the physical plan operator, and the one that is the result of the plan generator and semantic analysis).
Having said that, I am not sure why we need this metadata enrichment, and as I mentioned in a previous comment maybe (although not necessary) we can add an assertion that verifies the structural equality of both structures, which is very simple to do (but perhaps expensive to compute).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class does not combine DataType.StructType and Type.Record anymore. But it combines DataType.StructType with RecordMetaData information. DataType.StructType does have access to the structural shape of nested records, but it does not have access to the underlying nested type struct names, so I don't think we can remove this unfortunately.
This commit implements a solution for GitHub issue #3743 where struct type
metadata (type names like "STRUCT_1", "STRUCT_2") was getting lost during
query execution, especially in continuations.
Problem:
When executing queries that return struct types, ResultSetMetaData would return
UUID-based names (like "id...") instead of proper struct type names. This affected:
Root Cause:
Solution - Hybrid Approach:
Merge semantic type structure with RecordMetaData descriptors: