You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
refactor: Address review comments for CSV union schema feature
Addresses all review feedback from PR #17553 to improve the CSV schema
union implementation that allows reading CSV files with different column counts.
Changes based on review:
- Moved unit tests from separate tests.rs to bottom of file_format.rs
- Updated documentation wording from "now supports" to "can handle"
- Removed all println statements from integration test
- Added comprehensive assertions for actual row content verification
- Simplified HashSet initialization using HashSet::from([...]) syntax
- Updated truncated_rows config documentation to reflect expanded purpose
- Removed unnecessary min() calculation in column processing loop
- Fixed clippy warnings by using enumerate() instead of range loop
Technical improvements:
- Tests now verify null patterns correctly across union schema
- Cleaner iteration logic without redundant bounds checking
- Better documentation explaining union schema behavior
The feature continues to work as designed:
- Creates union schema from all CSV files in a directory
- Files with fewer columns have nulls for missing fields
- Requires explicit opt-in via truncated_rows(true)
- Maintains full backward compatibility
0 commit comments