Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve async handling for s3fs #2901

Open
gbip opened this issue Mar 10, 2025 · 2 comments
Open

Improve async handling for s3fs #2901

gbip opened this issue Mar 10, 2025 · 2 comments
Labels
bug Potential issues with the zarr-python library

Comments

@gbip
Copy link

gbip commented Mar 10, 2025

Zarr version

v3.X

Numcodecs version

v0.15.1

Python Version

3.12

Operating System

Linux

Installation

pip into venv

Description

When opening a zarr using s3fs, some errors related to aiohttp objects are raised when leaving the Python interpreter. From what I understand, it seems that cleaning asynchronous objects should be performed by the filestorage implementation, as in #2674 (comment).

However it is unclear to me if this should be performed by zarr_python or the s3fs implementation, as discussed here : fsspec/s3fs#943.

What do you think ?

Steps to reproduce

import xarray
import fsspec
import zarr

s3_endpoint = "http://localhost:19900"

fs = fsspec.filesystem(
    "s3",
    key="ACCESS",
    secret="S3CRET",
    endpoint_url=s3_endpoint,
    asynchronous=True,
)
store = zarr.storage.FsspecStore(fs, path="/mybucket/myzarr.zarr")
dataset = xarray.open_zarr(store, zarr_format=3)

Should raise the following error :

Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x7fe2071f6c60>
Unclosed connector
connections: ['deque([(<aiohttp.client_proto.ResponseHandler object at 0x7fe2071ca6f0>, 24766.009516954), (<aiohttp.client_proto.ResponseHandler object at 0x7fe2071cac30>, 24766.015895832)])']
connector: <aiohttp.connector.TCPConnector object at 0x7fe2071ab800>

Additional output

No response

@gbip gbip added the bug Potential issues with the zarr-python library label Mar 10, 2025
@TomAugspurger
Copy link
Contributor

What happens if your application code closes the filesystem (does s3fs offer an async with context? I don't remember)?

async with fs:
    store = zarr.storage.FsspecStore(fs, path="/mybucket/myzarr.zarr")
    dataset = xarray.open_zarr(store, zarr_format=3)

The tricky bit here is that zarr-python doesn't really know who owns the filesystem passed into FsspecStore. We don't necessarily want to close it, since the user might be using it outside of zarr.

Maybe we could make FsspecStore (and the related stores) async context managers and have then close the store upon __aexit__?

@rabernat
Copy link
Contributor

s3fs actually has a custom finalizer that should close these sessions in some cases

https://github.com/fsspec/s3fs/blob/01b9c4b838b81375093ae1d78562edf6bdc616ea/s3fs/core.py#L575-L578

Although it looks like that is not used for asynchronous filesystems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Potential issues with the zarr-python library
Projects
None yet
Development

No branches or pull requests

3 participants