_as_pandas() converts 'NA' to nan - consider adding optional keep_default_na=False argument #154

NanisTe · 2020-07-31T16:38:59Z

'NA' strings in a result in result_set.py are converted to nan by the function _as_pandas().

Consider adding optional argument keep_default_na in the pd.read_csv() inside _as_pandas() function to control behaviour.

This correlates to issue #118 and #120

laughingman7743 · 2020-08-02T04:31:26Z

How about allowing the option to keep_default_na, na_values, in the execution method as follows:
c379656

from pyathena import connect
from pyathena.pandas_cursor import PandasCursor

cursor = connect(s3_staging_dir='s3://YOUR_S3_BUCKET/path/to/',
                 region_name='us-west-2',
                 cursor_class=PandasCursor).cursor()

df = cursor.execute("SELECT * FROM many_rows", keep_default_na=False, na_values=[""]).as_pandas()

NanisTe · 2020-08-03T08:09:17Z

How about allowing the option to keep_default_na, na_values, in the execution method as follows:
c379656

from pyathena import connect
from pyathena.pandas_cursor import PandasCursor

cursor = connect(s3_staging_dir='s3://YOUR_S3_BUCKET/path/to/',
                 region_name='us-west-2',
                 cursor_class=PandasCursor).cursor()

df = cursor.execute("SELECT * FROM many_rows", keep_default_na=False, na_values=[""]).as_pandas()

Is that an already working solution or do you suggest to implement something like this?
I would keep it in as_pandas() since it is much more related to that than to the query execution itself.

laughingman7743 · 2020-08-03T08:53:49Z

Is that an already working solution or do you suggest to implement something like this?

It is implemented in the following branches.
#120

I would keep it in as_pandas() since it is much more related to that than to the query execution itself.

The current implementation is designed to load the CSV automatically after the query is executed, so calling as_pasdas does not load the CSV. The _as_pandas method is called in the constructor of the result_set object.
I don't want to make any major changes to this implementation.

laughingman7743 mentioned this issue Aug 2, 2020

Fix empty & null string conversion with PandasCursor (refs #118) #120

Merged

laughingman7743 closed this as completed Aug 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

_as_pandas() converts 'NA' to nan - consider adding optional keep_default_na=False argument #154

_as_pandas() converts 'NA' to nan - consider adding optional keep_default_na=False argument #154

NanisTe commented Jul 31, 2020 •

edited

Loading

laughingman7743 commented Aug 2, 2020 •

edited

Loading

NanisTe commented Aug 3, 2020

laughingman7743 commented Aug 3, 2020

_as_pandas() converts 'NA' to nan - consider adding optional keep_default_na=False argument #154

_as_pandas() converts 'NA' to nan - consider adding optional keep_default_na=False argument #154

Comments

NanisTe commented Jul 31, 2020 • edited Loading

laughingman7743 commented Aug 2, 2020 • edited Loading

NanisTe commented Aug 3, 2020

laughingman7743 commented Aug 3, 2020

NanisTe commented Jul 31, 2020 •

edited

Loading

laughingman7743 commented Aug 2, 2020 •

edited

Loading