Convert blob to use iter_coroutine and _http#48
Conversation
Add shared HTTP infrastructure and relocate iter_coroutine into its own module so downstream consumers can adopt it incrementally later.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Pull request overview
This PR successfully migrates the Vercel Blob SDK to an async-first architecture using iter_coroutine and unified HTTP transports. The refactoring consolidates duplicate sync/async implementations into a shared async core (src/vercel/blob/_core.py) while maintaining stable public APIs through thin sync wrappers.
Changes:
- Introduced
_SyncBlobOpsClientand_AsyncBlobOpsClientclasses with shared_BaseBlobOpsClientbase containing core blob operations (put, delete, head, list, iter, copy, create_folder) - Refactored multipart uploads to use runtime-specific executors:
_SyncMultipartUploadRuntime(ThreadPoolExecutor) and_AsyncMultipartUploadRuntime(asyncio tasks) with shared orchestration logic - Added
RawBodytransport wrapper for pass-through streaming bodies, enabling consistent request building viabuild_request + sendpattern with support forstreamandfollow_redirectsoptions - Documented the iter-coroutine + base/runtime migration pattern in AGENTS.md for future refactoring efforts
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/vercel/blob/_core.py |
New 925-line async-first core with _BlobRequestClient, _BaseBlobOpsClient, and concrete sync/async clients; includes error mapping, request execution, and all blob operations |
src/vercel/blob/ops.py |
Refactored public APIs to delegate to client classes via _run_sync_blob_operation wrapper and async context managers; get and download_file use direct transport access for storage URLs |
src/vercel/blob/api.py |
Simplified request_api and request_api_async to thin wrappers over request_api_core |
src/vercel/blob/multipart/uploader.py |
Extracted runtime-specific upload logic into _SyncMultipartUploadRuntime and _AsyncMultipartUploadRuntime classes with shared helpers |
src/vercel/blob/multipart/core.py |
Introduced _BaseMultipartClient with shared multipart operations, _SyncMultipartClient and _AsyncMultipartClient implementations |
src/vercel/blob/multipart/api.py |
Refactored to use client classes instead of direct function calls, consolidated validation/normalization helpers |
src/vercel/_http/transport.py |
Added RawBody dataclass and updated transport send methods to support follow_redirects and stream parameters via build_request + send pattern |
src/vercel/_http/__init__.py |
Exported RawBody for use in blob operations |
tests/test_sync_async_parity.py |
Added iterator type parity test for iter_objects / iter_objects_async |
tests/integration/test_http_transport_raw_body.py |
New test file validating RawBody behavior for sync/async iterables |
tests/integration/test_blob_sync_async.py |
Added comprehensive integration tests for multipart flows, get/download_file operations with progress callbacks, and pagination with limit-aware batching |
tests/integration/test_blob_multipart_auto_upload.py |
New test file covering auto multipart upload flows for sync/async runtimes, manual multipart operations, and unknown-total progress handling |
AGENTS.md |
Documented iter-coroutine + base/runtime migration pattern with core principles, recommended structure, guardrails, and testing expectations |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/vercel/blob/ops.py
Outdated
| except Exception: | ||
| try: | ||
| if os.path.exists(tmp): | ||
| os.remove(tmp) | ||
| finally: | ||
| if response is not None: | ||
| response.close() | ||
| transport.close() | ||
| raise | ||
| else: | ||
| if response is not None: | ||
| response.close() | ||
| transport.close() | ||
| return dst |
There was a problem hiding this comment.
Resource cleanup logic has a critical issue: when an exception occurs, the code tries to clean up resources in the finally block of the inner try, but then raises the exception. This means the else block (lines 638-641) will never execute, leaving resources unclosed on the success path. The transport and response should be closed in a finally block that executes regardless of whether an exception occurred or not. Consider restructuring to ensure resources are always cleaned up.
src/vercel/blob/ops.py
Outdated
| except Exception: | ||
| try: | ||
| if os.path.exists(tmp): | ||
| os.remove(tmp) | ||
| finally: | ||
| if response is not None: | ||
| await response.aclose() | ||
| await transport.aclose() | ||
| raise | ||
| else: | ||
| if response is not None: | ||
| await response.aclose() | ||
| await transport.aclose() | ||
| return dst |
There was a problem hiding this comment.
Resource cleanup logic has the same critical issue as in the sync version: when an exception occurs, resources are cleaned up in the finally block of the inner try, then the exception is raised. This means the else block (lines 710-713) will never execute, leaving resources unclosed on the success path. The transport and response should be closed in a finally block that executes regardless of whether an exception occurred.
|
|
||
| def create_sync_multipart_upload_runtime() -> _SyncMultipartUploadRuntime: | ||
| return _SyncMultipartUploadRuntime() | ||
|
|
There was a problem hiding this comment.
Missing blank line between function definitions. PEP 8 requires two blank lines before a top-level function definition. There should be a blank line between line 305 and 306.
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
Instead of the very permissive decoding we had before, use explicit decoding behavior with corresponding specific exceptions. This is _kind of_ a breaking change since before if we failed to parse JSON, we returned a str that would fail later with a KeyError or some other runtime error, and now we fail earlier with a more specific error.
Not a full-collapse since the public API seems to really want to be called functionally, but moved the client instantiation "up" a bit to avoid making a client and transport on _every_ separate part. Didn't go with a ContextVar thing due to concerns around the behavior of sync in worker threads. Note: This is somewhat a breaking change if anyone was ever importing directly from the internal `vercel.blob.multipart.core` module. I didn't bother re-wrapping since this really seems like an internal implementation detail even though it was publically accessible.
# Conflicts: # src/vercel/blob/multipart/api.py # src/vercel/blob/multipart/uploader.py # src/vercel/blob/ops.py
418ca9a to
b877b55
Compare
Delete _parse_last_modified, _build_cache_bypass_url, _build_get_result, and _resolve_blob_url which were dead code after the iter-coroutine refactor moved all get/download paths to _core.py. Remove their now-unused imports and update tests to import _parse_last_modified from _core instead.
get_async() in ops.py used default_timeout=120.0 while all other call sites (get(), BlobClient.get(), AsyncBlobClient.get()) used 30.0. Align to 30.0 everywhere.
|
@scotttrinh , should we move to the |
These methods cannot delegate to a Base through iter_coroutine since they require different internal usage of generators to iterate through pages and objects. Instead of defining the async version on the base and then having to override in the sync version, just completely inline this into each separate client.
Add explicit return types to BlobClient.create_multipart_uploader (-> MultipartUploader) and AsyncBlobClient.create_multipart_uploader (-> AsyncMultipartUploader).
Since b877b55, AsyncBlobClient holds a persistent httpx transport bound to one event loop. The example was calling asyncio.run() twice with the same client, causing a "Event loop is closed" error on the second call. Combine both async examples into a single asyncio.run().
I'm ok with that. The "public" API is pretty broad right now, but all of these new |
This PR migrates Blob to an async-first shared core and keeps the public sync/async APIs stable via thin wrappers.
What changed
src/vercel/blob/_core.pyas the shared implementation for:put,delete,head,list,iter,copy,create_folder)src/vercel/blob/ops.py,src/vercel/blob/api.py, andsrc/vercel/blob/multipart/api.pyto delegate to shared clients:_SyncBlobOpsClient/_SyncMultipartClientwith a singleiter_coroutine(...)boundary_AsyncBlobOpsClient/_AsyncMultipartClientwith directawaitvercel._http:RawBodysupport for pass-through streaming bodiesbuild_request + send, enabling consistentstreamandfollow_redirectshandlingasynciotasksHigh-level call flow
put(...)/put_async(...)->
blob.opswrapper->
_SyncBlobOpsClient._put_blob/_AsyncBlobOpsClient._put_blob-> single-part:
_request_api(...)-> multipart: create upload -> upload parts via runtime -> complete upload
->
request_api_core(...)->
_httptransport sendget(...)/download_file(...)(and async variants)-> resolve path via
headwhen needed->
_httptransportGET(optionally streamed)-> return bytes / write file with progress callbacks
Validation
RawBodybehavioriter_objectsvsiter_objects_async).