-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Description
Problem
This is a design discussion issue regarding whether DataFusion should adopt case-insensitive field matching when casting between structs.
Current Behavior
DataFusion uses case-sensitive field name matching. For example:
- Field
xand fieldXare treated as different fields - A cast from
struct<x int, y int>tostruct<X int, Y int>would fail (no name overlap)
DuckDB Behavior
DuckDB uses case-insensitive field name matching:
- Field
xand fieldXare treated as the same field - The same cast would succeed, with fields matched case-insensitively
Motivation for Case-Insensitive Matching
Pros:
- ✅ Aligns with DuckDB — improves compatibility with a major SQL database
- ✅ More forgiving — handles common casing variations (e.g., JSON sources with inconsistent field names)
- ✅ Follows SQL conventions — SQL generally treats identifiers as case-insensitive
- ✅ User-friendly — reduces friction when working with data from different sources
Arguments for Keeping Case-Sensitive Matching
Pros:
- ✅ Arrow foundation — DataFusion is built on Apache Arrow, which is case-sensitive:
- ✅ Language consistency — matches Rust and JSON conventions (case-sensitive)
- ✅ Prevents ambiguity — avoids edge cases where source has both
xandX(rare but possible) - ✅ Predictable behavior — case-sensitive matching is more explicit and easier to reason about in programmatic contexts
Question
Should DataFusion follow SQL's case-insensitivity or remain aligned with Arrow's case-sensitive semantics?
Next Steps
This issue is intended to surface the design question and gather community feedback.
Metadata
Metadata
Assignees
Labels
No labels