Skip to content

Case-insensitive field matching in struct casting #19842

@kosiew

Description

@kosiew

Problem

This is a design discussion issue regarding whether DataFusion should adopt case-insensitive field matching when casting between structs.

Current Behavior

DataFusion uses case-sensitive field name matching. For example:

  • Field x and field X are treated as different fields
  • A cast from struct<x int, y int> to struct<X int, Y int> would fail (no name overlap)

DuckDB Behavior

DuckDB uses case-insensitive field name matching:

  • Field x and field X are treated as the same field
  • The same cast would succeed, with fields matched case-insensitively

Motivation for Case-Insensitive Matching

Pros:

  • Aligns with DuckDB — improves compatibility with a major SQL database
  • More forgiving — handles common casing variations (e.g., JSON sources with inconsistent field names)
  • Follows SQL conventions — SQL generally treats identifiers as case-insensitive
  • User-friendly — reduces friction when working with data from different sources

Arguments for Keeping Case-Sensitive Matching

Pros:

  • Arrow foundation — DataFusion is built on Apache Arrow, which is case-sensitive:
  • Language consistency — matches Rust and JSON conventions (case-sensitive)
  • Prevents ambiguity — avoids edge cases where source has both x and X (rare but possible)
  • Predictable behavior — case-sensitive matching is more explicit and easier to reason about in programmatic contexts

Question

Should DataFusion follow SQL's case-insensitivity or remain aligned with Arrow's case-sensitive semantics?

Next Steps

This issue is intended to surface the design question and gather community feedback.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions