Skip to content

Conversation

emilk
Copy link
Contributor

@emilk emilk commented Sep 15, 2025

Rationale for this change

I believe that error messages should be as readable as possible. Aim for rustc more than gcc.

Display is the nice, user-facing formatter. Debug is for… well, debugging.

What changes are included in this PR?

Change a bunch of {:?} format string to {}. I'm sure I missed a lot of them, because I know of no way to enforce this without

Are these changes tested?

I assume CI runs cargo test :)

Are there any user-facing changes?

Yes! Error messages should be a bit more readable now, especially once this lands:

@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate substrait Changes to the substrait crate catalog Related to the catalog crate common Related to common crate proto Related to proto crate functions Changes to functions implementation datasource Changes to the datasource crate spark labels Sep 15, 2025
@emilk emilk marked this pull request as ready for review September 15, 2025 09:36
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @emilk -- this is looking great. Once we get the CI to pass I think we can merge it in

@emilk
Copy link
Contributor Author

emilk commented Sep 15, 2025

Should be green now

[Narrator] it was not

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Sep 15, 2025
@emilk emilk force-pushed the emilk/use-display-formatting branch from 3812ed5 to 6ef4be0 Compare September 15, 2025 13:16
@alamb
Copy link
Contributor

alamb commented Sep 15, 2025

I recommend running cargo test locally first to catch all the test errors and fix them in one go, in case you're not already doing so

My personal favorite is cargo nextest run

@emilk
Copy link
Contributor Author

emilk commented Sep 15, 2025

I did, unfortunately cargo test they didn't cover all the test, and cargo test --all-features doesn't compile. Apparently CI does something slightly more complex:

https://github.com/rerun-io/arrow-datafusion/blob/e077920b19867d02b43f836c92bfb61183493ea0/.github/workflows/extended.yml#L109-L119

…and others

@emilk
Copy link
Contributor Author

emilk commented Sep 15, 2025

cargo nextest run passes on my machine now - let's see what CI says

@emilk
Copy link
Contributor Author

emilk commented Sep 17, 2025

Reverted some error messages, and cargo nextest run is still green locally…

@emilk
Copy link
Contributor Author

emilk commented Sep 17, 2025

OMG IS GREEN

@emilk emilk changed the title Use Display formatting of DataType:s in error messagwe Use Display formatting of DataType:s in error messages Sep 17, 2025
Copy link
Contributor

@timsaucer timsaucer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, Emil!

@alamb
Copy link
Contributor

alamb commented Sep 17, 2025

gogogogo!

@alamb alamb added this pull request to the merge queue Sep 17, 2025
@alamb
Copy link
Contributor

alamb commented Sep 17, 2025

Thank you very much @emilk and @timsaucer

Merged via the queue into apache:main with commit 631f9ab Sep 17, 2025
32 checks passed
mbrobbel pushed a commit to apache/arrow-rs that referenced this pull request Sep 23, 2025
This is part of an attempt to improve the error reporting of `arrow-rs`,
`datafusion`, and any other 3rd party crates.

I believe that error messages should be as readable as possible. Aim for
`rustc` more than `gcc`.

Here's an example of how this PR improves some existing error messages:

Before:
> Casting from Map(Field { name: "entries", data_type: Struct([Field {
name: "key", data_type: Utf8, nullable: false, dict_id: 0,
dict_is_ordered: false, metadata: {} }, Field { name: "value",
data_type: Interval(DayTime), nullable: true, dict_id: 0,
dict_is_ordered: false, metadata: {} }]), nullable: false, dict_id: 0,
dict_is_ordered: false, metadata: {} }, false) to Map(Field { name:
"entries", data_type: Struct([Field { name: "key", data_type: Utf8,
nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} },
Field { name: "value", data_type: Duration(Second), nullable: false,
dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: false,
dict_id: 0, dict_is_ordered: false, metadata: {} }, true) not supported

After:
> Casting from Map(Field { "entries": Struct(key Utf8, value nullable
Interval(DayTime)) }, false) to Map(Field { "entries": Struct(key Utf8,
value Duration(Second)) }, true) not supported

# Which issue does this PR close?
- Closes #7048
- Continues and closes #7051
- Continues #7469
- More improvements coming in
#8291
- Sibling PR: apache/datafusion#17565
- Part of #8351

# Rationale for this change
DataType:s are often shown in error messages. Making these error
messages readable is _very important_.

# What changes are included in this PR?
## ~Unify `Debug` and `Display`~
~The `Display` and `Debug` of `DataType` are now the SAME.~

~Why? Both are frequently used in error messages (both in arrow, and
`datafusion`), and both benefit from being readable yet reversible.~

Reverted based on PR feedback. I will try to improve the `Debug`
formatting in a future PR, with clever use of
https://doc.rust-lang.org/std/fmt/struct.Formatter.html#method.debug_struct

## Improve `Display` of lists
Improve the `Display` formatting of
* `DataType::List`
* `DataType::LargeList`
* `DataType::FixedSizeList`

**Before**: `List(Field { name: \"item\", data_type: Int32, nullable:
true, dict_id: 0, dict_is_ordered: false, metadata: {} })`
**After**: `List(nullable Int32)`

**Before**: `FixedSizeList(Field { name: \"item\", data_type: Int32,
nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, 5)`
**After**: `FixedSizeList(5 x Int32)`

### Better formatting of `DataType::Struct`
The formatting of `Struct` is now reversible, including nullability and
metadata.

- Continues #7469

### ~Improve `Debug` format of `Field`~
~Best understood with this diff for an existing test:~

<img width="1140" height="499" alt="Screenshot 2025-09-07 at 18 30 44"
src="https://github.com/user-attachments/assets/794b4de9-8459-4ee7-82d2-8f5ae248614c"
/>

**EDIT**: reverted

# Are these changes tested?
Yes - new tests cover them

# Are there any user-facing changes?
`Display/to_string` has changed, and so this is a **BREAKING CHANGE**.

Care has been taken that the formatting contains all necessary
information (i.e. is reversible), though the actual `FromStr`
implementation is still not written (it is missing on `main`, and
missing in this PR - so no change).


----

Let me know if I went to far… or not far enough 😆

---------

Co-authored-by: irenjj <[email protected]>
alamb pushed a commit to apache/arrow-rs that referenced this pull request Sep 24, 2025
# Rationale for this change
Despite us having `Display` implementations for `DataType`, a lot of
error messages still use `Debug`. See for instance:

* apache/datafusion#17565
* #8290

Therefor I want to make sure the `Debug` formatting of `Field` (and, by
extension, `DataType`) is not _utterly horrible_. This PR makes things…
slightly better.

# What changes are included in this PR?
Omits fields of `Field` that have their "default" values.

# Are these changes tested?
Yes, there are new tests.

# Are there any user-facing changes?
Though this changes the `Debug` formatting, I would NOT consider this a
breaking change, because nobody should rely on consistent `Debug`
formatting. See for instance
https://doc.rust-lang.org/std/fmt/trait.Debug.html#stability
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
catalog Related to the catalog crate common Related to common crate core Core DataFusion crate datasource Changes to the datasource crate functions Changes to functions implementation logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates proto Related to proto crate spark sql SQL Planner sqllogictest SQL Logic Tests (.slt) substrait Changes to the substrait crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants