POC: consistent NaN treatment for pyarrow dtypes #61732

jbrockmendel · 2025-06-28T17:23:26Z

This is the third of several POCs stemming from the discussion in #61618 (see #61708, #61716). The main goal is to see how invasive it would be.

Specifically, this changes the behavior of pyarrow floating dtypes to treat NaN as distinct from NA in the constructors and __setitem__ (xref #32265)

Notes:

This makes the decision to treat NaNs as close-enough to NA when a user explicitly asks for a pyarrow integer dtype. I think this is the right API, but won't check the box until there's a concensus.
I still have 113 failing tests locally. Most of these are in json, sql, or test_EA_types (which is about csv round-tripping).
Finding the mask to pass to pa.array needs optimization.
The kludge in NDFrame.where is ugly and fragile.
Need to double-check the new expected in the rank test. Maybe re-write the test with NA instead of NaN?

jbrockmendel added 2 commits June 28, 2025 10:07

POC: consistent NaN treatment for pyarrow dtypes

3fad33f

comment

f1e8ba0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

POC: consistent NaN treatment for pyarrow dtypes #61732

POC: consistent NaN treatment for pyarrow dtypes #61732

jbrockmendel commented Jun 28, 2025

Uh oh!

Uh oh!

Uh oh!

POC: consistent NaN treatment for pyarrow dtypes #61732

Are you sure you want to change the base?

POC: consistent NaN treatment for pyarrow dtypes #61732

Conversation

jbrockmendel commented Jun 28, 2025

Uh oh!

Uh oh!