-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: pd.api.types.infer_dtype on scalar input #61081
Comments
I've tried looking into the function definition of lib.pyx but couldn't find anything immediately obvious that pointed to a cause. |
take |
This goes back to da0523a#diff-40a8d0cc4a6796116f539b1e47d5f56dfc1d061e9563d2c927dbc8bd178df8f3. There the docstring added looks incorrect, it is clear the function never attempted to support scalars, although it is very easy to read: if not isinstance(value, list):
value = list(value) as wrapping a scalar (but it does not). I think we should fix the documentation here. |
@rhshadrach Alright. I initially thought that replacing list(value) with [value] would allow the function to support scalars, which might work for scalar too. What do you think? If you prefer, we could update the tests and documentation to reflect scalar support. |
All right. Too bad, I was hoping I could use df.map(pd.api.types.infer_dtype).unique() or df.map(pd.api.types.infer_dtype).value_counts() to get a sense of the data type mix in dirty columns for a wide data set all at once. If vectorisation is not an option, I'll have to do it with a for loop. |
@gnotisauton - using
is a workaround.
No - that would break existing use cases when |
Just in case anyone finds this thread and wants to use this workaround: it doesn't work out of the box for imported data. Values in a mixed column are returned as strings, e.g. df = pd.DataFrame({'a':[1.0, '1,0',10]})
df.map(lambda v: pd.api.types.infer_dtype([v])) works (data types appropriate), but # exporting and re-importing the above dataframe
df.to_csv(some_path, index=False)
df = pd.read_csv(some_path)
df.map(lambda v: pd.api.types.infer_dtype([v])) does not (data types all 'string') |
Not sure I understand. I get floating, string, integer for the example you posted. Is that not what is desired? Edit: I think I see, your issue is when you read from csv, you get all strings. That is the correct answer - the data in that DataFrame is a string, and |
Context
I was trying to identify data types in columns with mixed data types:
Pandas version checks
Reproducible Example
Issue Description
Expected Behavior
According to the documentation, pd.api.types.infer_dtype() should accept scalar input.
Installed Versions
The text was updated successfully, but these errors were encountered: