Conversation
|
@kylebarron as another alternative, could you broaden the type for |
|
Also here's an example of the current version in use, following up from your question today. VirtualiZarr uses |
Of course, yes, that's possible to do. But the problem is that it starts to fundamentally break down our abstractions. Knowing the length of an object is obviously extremely valuable information. Right now that length is always known because object stores always know that. If we changed the type to |
I think we should probably use synchronous HTTP clients whenever the end code is synchronous, and vice versa async clients with async usage. Creating a new aiohttp client session for every request incurs a lot of overhead and we shouldn't allow that. For the AiohttpStore, I think I'd opt towards requiring the end-user to pass in a valid async with aiohttp.ClientSession() as session:
store = AiohttpStore(
session,
base_url,
headers={"Authorization": f"Bearer {token}"},
)
registry = ObjectStoreRegistry({base_url: store})Then
But alternatively for end users who want to make synchronous requests, they can use a |
Downstream applications can raise an Error if size is none and continue to rely on it. It's a tradeoff between optimizing for object storage and allowing extensibility to other protocols. The value of obspec-utils is largely in the extensibility, at least for my use case which is out of region earth data usage, which is why I'd prefer the option that lets an http based store implement the Head protocol. |
I don't think that really works. Downstream applications wouldn't know when they can raise an error and when they couldn't. Which parts of obspec they can disregard. I feel like if we're going to go the "raise an exception" route, it would be better to do it in the AiohttpStore itself. |
alright, sounds like the "raise an exception" route is the best compromise to not entirely remove HeadAsync and GetAsync operations from AiohttpStore. Separately, we'll implement a sync requests store. |
Change list
AiohttpStore should not implement synchronous methods
As an async backend, the AiohttpStore should not be implementing synchronous methods, because that muddles for downstream users which requests have overhead and which requests don't.
AiohttpStore cannot implement
GetAsyncI hit this and then had a 20 minute conversation about it with @vincentsarago and @geospatial-jeff 😅.
The problem is: object stores guarantee more information than the HTTP spec. If we declare obspec to be a specification modeling object stores, then we cannot satisfy the spec with generic HTTP responses.
The existing aiohttp implementation worked by materializing the HTTP result into a single buffer. That is not an option;
GetAsyncis intended to represent streaming HTTP requests. Downloading the entire request up-front will provide poor memory characteristics for any library that depends on aiohttp.I tried to model an
AiohttpGetResultAsyncholding aresponse: aiohttp.ClientResponse. But HTTP responses do not always have a content length defined. This is in contrast to object stores that guarantee the content length will always exist, even for streamed responses.AiohttpStore cannot implement
HeadAsyncFor the same reason as above, since the HTTP
content-lengthis not defined,Alternatively, we could error at runtime
Since
content-lengthis defined much of the time, but not all, it would be possible to just error at runtime whencontent-lengthdoesn't exist. This is the approach taken byobject_store: apache/arrow-rs-object-store#340AiohttpStore implements
GetRangeAsyncandGetRangesAsyncSo with the above discussion, the aiohttp store implements just the range request functionality.