feat: Support Ref types in Scan [Avro] #18812
Open
+1,601
−65
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
The main issue I was trying to solve is that despite the unreachable!() block in the avro-to-arrow schema conversion, when providing an arrow schema to the reader, the writer_schema from the file could still contain Ref types, and we'd never know about it, there is no error handling there, the values would simply return as Null.
This is much more severe than just crashing due to a not-yet-implemented feature.
However I used this opportunity to implement this for the schema conversion as well.
What changes are included in this PR?
A new lookup method for the schema, both when resolving positions, and when actually converting avro schemas to arrow schemas, this will let us resolve Ref types under some constraints (Namely, arrow does not allow circular dependencies in schema, I'm not sure how this could ever be accomplished without some drastic changes in the way schemas are composed)
Are these changes tested?
Yes, added as many tests as I could think of.
Are there any user-facing changes?
Not really, should just work out of the box.