Skip to content

Conversation

@alecgrieser
Copy link
Collaborator

@alecgrieser alecgrieser commented Nov 21, 2025

This reintroduces protobuf type and field renaming. The first version was added in #3696 and #3706 and then taken out by #3726. A second version was introduced with #3736 and then removed by #3767. This adds back a variation based on the data from #3736 but updated to be more resilient to unexpected names in existing meta-data objects.

The issue with #3736 before is that if a type existed in the meta-data that was not correctly escaped (e.g., Type__Blah instead of Type__0Blah), then it would fail to match the type during querying. This was a problem even if the type wasn't actually involved in a query because of how matching worked on the FullUnorderedScanExpression, meaning that any query would fail to plan if any type in the meta-data was so written.

This makes things more resilient. We now do a bit more work to associate a type with its original name from the protobuf file if one is provided, recording both the user-visible name and the original storage name. The only places that we now generate new protobuf compliant names is when we construct a Type object. In all other cases, we only go from the storage name to user-visible names.

We still do rely on the fact that we can correctly predict the expected user-visible name by running the de-escaping logic. At some point, we may need to have a more complicated mapping, especially if we want to support more arbitrary names. That is left as future work. I could also see us wanting to do a bit more refactoring to better encapsulate this transformation.

The new test modifications made to valid-identifiers.yamsql cover those cases by adding new types with names that would not have been generated by any DDL statement, and then validating that (1) those do not disrupt correctly constructed queries and (2) that the problematic types can themselves be queried.

In addition, this addresses some shortcomings with the match candidates where FieldKeyExpressions (which use the internal names) would sometimes be used to generate match candidates which referenced the internal name directly. This fixes that by plugging those gaps. There are additional queries in valid-identifiers.yamsql that are designed to cover those matches.

alecgrieser and others added 6 commits November 20, 2025 16:11
This adds support for retaining the protobuf names more directly for types and fields. This can happen if the user has created a meta-data proto and used a strategy for naming that differs from the one that would have been generated by our own DML.

The basic strategy is to:

1. Continue to always apply the `toProtoUtils` method to produce plausible user-generated names but
1. Retain the original protobuf name in the `Type` information and then use that to get the name used to access data in the field
… that code in the RecordMetadataDeserializer with logic in the Type system
- make sure to convert FieldKeyExpression#fieldName (which is internal) to
  user-facing name when constructing match candidates.
- also, add tests for deeply nested (and repeated) structures with non-pb-compliant
  field names, and an index.
… can match the indexes in the same circumstances as cases with non-escaped identifiers
@alecgrieser alecgrieser added the bug fix Change that fixes a bug label Nov 24, 2025
@alecgrieser alecgrieser marked this pull request as ready for review November 24, 2025 15:09
@github-actions
Copy link

📊 Metrics Diff Analysis Report

Summary

  • New queries: 76
  • Dropped queries: 0
  • Plan changed + metrics changed: 0
  • Plan unchanged + metrics changed: 0
ℹ️ About this analysis

This automated analysis compares query planner metrics between the base branch and this PR. It categorizes changes into:

  • New queries: Queries added in this PR
  • Dropped queries: Queries removed in this PR. These should be reviewed to ensure we are not losing coverage.
  • Plan changed + metrics changed: The query plan has changed along with planner metrics.
  • Metrics only changed: Same plan but different metrics

The last category in particular may indicate planner regressions that should be investigated.

New Queries

Count of new queries by file:

  • yaml-tests/src/test/resources/in-predicate.metrics.yaml: 4
  • yaml-tests/src/test/resources/valid-identifiers.metrics.yaml: 72

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug fix Change that fixes a bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants