Skip to content

Can write but cannot read parquet table if pyarrow is installed but not pandas #17972

@braingram

Description

@braingram

Description

In an environment with astropy and pyarrow installed tables can be written to but not read from parquet files.

Expected behavior

I expected to be able to read the written table without installing pandas.

How to Reproduce

Make new environment, pip install astropy and pyarrow

from astropy.table import Table
t = Table({"a": [1, 2, 3]})
t.write("foo.parq")
Table.read("foo.parq")

Error:

Traceback (most recent call last):
  File "/Users/bgraham/projects/250402_astropy_pyarrow_parquet_bug/foo.py", line 4, in <module>
    Table.read("foo.parq")
  File "/Users/bgraham/.pyenv/versions/astropy_pyarrow_parquet_bug/lib/python3.12/site-packages/astropy/table/connect.py", line 62, in __call__
    out = self.registry.read(cls, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/bgraham/.pyenv/versions/astropy_pyarrow_parquet_bug/lib/python3.12/site-packages/astropy/io/registry/core.py", line 221, in read
    data = reader(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/bgraham/.pyenv/versions/astropy_pyarrow_parquet_bug/lib/python3.12/site-packages/astropy/io/misc/parquet.py", line 217, in read_table_parquet
    dtype.append(value_type.to_pandas_dtype())
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow/types.pxi", line 406, in pyarrow.lib.DataType.to_pandas_dtype
  File "pyarrow/types.pxi", line 186, in pyarrow.lib._to_pandas_dtype
  File "pyarrow/pandas-shim.pxi", line 168, in pyarrow.lib._PandasAPIShim.is_v1
  File "pyarrow/pandas-shim.pxi", line 114, in pyarrow.lib._PandasAPIShim._check_import
  File "pyarrow/pandas-shim.pxi", line 55, in pyarrow.lib._PandasAPIShim._import_pandas
  File "pyarrow/pandas-shim.pxi", line 50, in pyarrow.lib._PandasAPIShim._import_pandas
ModuleNotFoundError: No module named 'pandas'

Versions

import astropy
try:
    astropy.system_info()
except AttributeError:
    import platform; print(platform.platform())
    import sys; print("Python", sys.version)
    import astropy; print("astropy", astropy.__version__)
    import numpy; print("Numpy", numpy.__version__)
    import erfa; print("pyerfa", erfa.__version__)
    try:
        import scipy
        print("Scipy", scipy.__version__)
    except ImportError:
        print("Scipy not installed")
    try:
        import matplotlib
        print("Matplotlib", matplotlib.__version__)
    except ImportError:
        print("Matplotlib not installed")
platform
--------
platform.platform() = 'macOS-14.5-arm64-arm-64bit'
platform.version() = 'Darwin Kernel Version 23.5.0: Wed May  1 20:12:58 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T6000'
platform.python_version() = '3.12.4'

packages
--------
astropy              7.0.1
numpy                2.2.4
scipy                --
matplotlib           --
pandas               --
pyerfa               2.0.1.5

also:

pyarrow           19.0.1

and no pandas

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions