Skip to content

Conversation

@arnaud-lacurie
Copy link
Collaborator

@arnaud-lacurie arnaud-lacurie commented Nov 14, 2025

This commit implements a solution for GitHub issue #3743 where struct type
metadata (type names like "STRUCT_1", "STRUCT_2") was getting lost during
query execution, especially in continuations.

Problem:
When executing queries that return struct types, ResultSetMetaData would return
UUID-based names (like "id...") instead of proper struct type names. This affected:

  • Star expansion queries (SELECT * FROM table)
  • Nested star queries (SELECT (*) FROM table)
  • Direct struct projections (SELECT struct_column FROM table)
  • Query continuations (EXECUTE CONTINUATION)

Root Cause:

  1. Semantic analysis produces correct DataTypes with struct names ("STRUCT_1", etc.)
  2. Cascades planner Type.Record loses struct names during optimization (becomes null)
  3. executePhysicalPlan() previously relied only on planner types → UUID generation
  4. Continuations had no semantic type info → always generated UUIDs

Solution - Hybrid Approach:

Merge semantic type structure with RecordMetaData descriptors:

  • Type structure from semantic DataTypes (preserves "STRUCT_1", "STRUCT_2")
  • Additional enrichment from RecordMetaData descriptors for nested types

This commit implements a solution for GitHub issue FoundationDB#3743 where struct type
metadata (type names like "STRUCT_1", "STRUCT_2") was getting lost during
query execution, especially in continuations.

**Problem:**
When executing queries that return struct types, ResultSetMetaData would return
UUID-based names (like "id...") instead of proper struct type names. This affected:
- Star expansion queries (SELECT * FROM table)
- Nested star queries (SELECT (*) FROM table)
- Direct struct projections (SELECT struct_column FROM table)
- Query continuations (EXECUTE CONTINUATION)

**Root Cause:**
1. Semantic analysis produces correct DataTypes with struct names ("STRUCT_1", etc.)
2. Cascades planner Type.Record loses struct names during optimization (becomes null)
3. executePhysicalPlan() previously relied only on planner types → UUID generation
4. Continuations had no semantic type info → always generated UUIDs

**Solution - Hybrid Approach:**

Merge semantic type structure with planner field names:
- Field names from planner Type.Record (handles aliases, star expansion, "_0" naming)
- Type structure from semantic DataTypes (preserves "STRUCT_1", "STRUCT_2")
- Additional enrichment from RecordMetaData descriptors for nested types
@arnaud-lacurie arnaud-lacurie added the enhancement New feature or request label Nov 14, 2025
Refactored canReadStructTypeName helper to use integer parameters for
controlling base query and continuation reruns. Added tests to cover
the withExecutionContext method in ContinuedPhysicalQueryPlan, addressing
the Teamscale test gap.
@arnaud-lacurie arnaud-lacurie marked this pull request as ready for review November 17, 2025 18:33
@arnaud-lacurie arnaud-lacurie requested a review from hatyo November 28, 2025 09:48
* @return List of DataTypes preserving struct type names (field names are placeholders)
*/
@Nonnull
private static List<DataType> captureSemanticTypeStructure(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expressions is a comprehension that captures operations on an ordered list of Expressions, please move the method and make it part of it, this enables for example, maintaining and caching the underlying data types. Something like:

Iterable<DataType> getDataTypes() {
    return ....
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, you can wrap the data types of the individual Expression item(s) of Expressions with a single StructType that is structurally equivalent to the output of the RelationalExpression's (and corresponding physical RecordQuery* operator's) RecordType.

This I think will streamline things quite nicely down the line, especially when you want to use that later on to parse the resultset metadata (please see my commet there).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh, this would be much nicer there, thanks for pointing this out!

final List<DataType> semanticFieldTypes;
if (resultType instanceof Type.Record) {
final Type.Record recordType = (Type.Record) resultType;
semanticFieldTypes = recordType.getFields().stream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the underlying Cascades type doesn’t preserve internal struct names, then the semantic field types based on them will also lose this information, right?

If this is the case, we might want to include the semantic information in the continuation as well, so that their (re)construction is completely independent of the underlying operator typing system.

Having said that, it seems like a good idea to associate the semantic information in the continuation itself, or put differently, integrate it into the state of the physical plan, which, albeit being useful for state management, could assist streamlining query processing in a world where separate but cooperative runtimes exist for plan generation, optimization, and execution exist for example.

* @throws RelationalException if type structures don't match
*/
@Nonnull
public static DataType.StructType mergeSemanticTypesWithPlannerNames(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed? it seems to me what we need is to perform the following assertion:

Assert.thatUnchecked(plannerType.isStrurtucallyEquivalentTo(DataTypeUtils.toRecordLayerType(semanticType))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer needed, and was replaced by something that takes semantic information and combines it with RecordMetaData information for nested structs.

final var currentPlanHashMode = OptionsUtils.getCurrentPlanHashMode(options);
final var dataType = (DataType.StructType) DataTypeUtils.toRelationalType(type);

final DataType.StructType resultDataType = TypeMetadataEnricher.mergeSemanticTypesWithPlannerNames(type, semanticFieldTypes, fdbRecordStore.getRecordMetaData());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semanticFieldTypes (or more precisely, the StructType made of these) must be structurally equivalent to the resulting Type.Record of the top-level physical operator of the plan. Therefore, I don't think you need to do any merging, perhaps we just want to validate the structural equivalence, and then only use the StructType created with the semanticFieldTypes to construct the result set metadata.

Should we need the underlying Type.Record for some reason, if I am not mistaken, DataTypeUtils.toRecordType can be used to construct the Type.Record with the nested field names correctly. (if this is not the case, we can definitely fix it).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, this is no longer using Type.Record, but is now coming from semanticStructType
Good suggestion 👍

Move field name capture to semantic analysis phase and simplify
type merging to only enrich nested structs from RecordMetaData.
@arnaud-lacurie arnaud-lacurie requested a review from hatyo November 28, 2025 20:44
* type metadata for result sets.
*/
@API(API.Status.EXPERIMENTAL)
public final class TypeMetadataEnricher {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this class? The DataType.StructType already captures all of the named-fields/named-record types correctly, so when a physical plan is received, the Type.Record object of the top-level operator does not provide any extra information that is needed. And since SQL dictates that the final result given to user must match the constructed result set as imperatively defined in the query, both objects must align structurally (the one coming from the physical plan operator, and the one that is the result of the plan generator and semantic analysis).

Having said that, I am not sure why we need this metadata enrichment, and as I mentioned in a previous comment maybe (although not necessary) we can add an assertion that verifies the structural equality of both structures, which is very simple to do (but perhaps expensive to compute).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class does not combine DataType.StructType and Type.Record anymore. But it combines DataType.StructType with RecordMetaData information. DataType.StructType does have access to the structural shape of nested records, but it does not have access to the underlying nested type struct names, so I don't think we can remove this unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants