-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Series.combine_first loss of precision #60128
Comments
Thanks for the report, confirmed on main. PRs to fix are welcome! |
It's because the dtype changed : int64->float64->int64 To combine with 'b', NaN need to be added to 'a', combine_first core/frame.py:8785 IMHO, it's a genaral practice, and converting this 'float64' back to 'int64' seems not natural. So, I think making the result 'float64' could be a solution. WDYT? Just for reference> if NaN exists from the first, it could be handled as 'int' : a = pd.Series([1666880195890293744, 5,pd.NA]).to_frame() |
It seems to me we should be able to carry out this operation without passing through floats. |
The cause of 'passing through floats' is, it tries to insert NaN , while converting 'a' from len:2 to len:3. In this case, should we insert other values (like 0 ) to keep the int64 type ? |
I am merely a user of Pandas, and the underlying code is far over my head, so maybe I should not be commenting here, but I wonder would it be possible to use here whatever solution was used to solve issue #39051 ? |
- Issue: There was int64->float64->int64 conversions - Fix: Carry out the operation without passing through float64
- Issue: There was int64->float64->int64 conversions - Fix: Carry out the operation without passing through float64
The above patch could solve this issue ( when all the columns are 'int64' ), but could not cover mixed case like below:
Currently, NDFrame::align simply get just 1 fill_value as a parameter. |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
I tried this on Pandas version 2.2.2 and I see that there is a loss of precision. This could be related to issue #51777.
Expected Behavior
1666880195890293744 should not get changed to 1666880195890293760
Installed Versions
INSTALLED VERSIONS
commit : d9cdd2e
python : 3.10.9.final.0
python-bits : 64
OS : Darwin
OS-release : 24.0.0
Version : Darwin Kernel Version 24.0.0: Tue Sep 24 23:37:36 PDT 2024; root:xnu-11215.1.12~1/RELEASE_ARM64_T6020
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.2.2
numpy : 1.26.4
pytz : 2022.7.1
dateutil : 2.8.2
setuptools : 67.3.2
pip : 24.2
Cython : 0.29.33
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.3
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.12.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.0
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.6.3
numba : None
numexpr : 2.8.4
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.10.0
sqlalchemy : None
tables : 3.8.0
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None
The text was updated successfully, but these errors were encountered: