Skip to content

Comments

Convert blob to use iter_coroutine and _http#48

Open
scotttrinh wants to merge 35 commits intomainfrom
iter-coroutine-blob
Open

Convert blob to use iter_coroutine and _http#48
scotttrinh wants to merge 35 commits intomainfrom
iter-coroutine-blob

Conversation

@scotttrinh
Copy link
Collaborator

This PR migrates Blob to an async-first shared core and keeps the public sync/async APIs stable via thin wrappers.

What changed

  • Added src/vercel/blob/_core.py as the shared implementation for:
    • request execution (headers, retries, error mapping, progress, response decoding)
    • blob operations (put, delete, head, list, iter, copy, create_folder)
  • Refactored public APIs in src/vercel/blob/ops.py, src/vercel/blob/api.py, and src/vercel/blob/multipart/api.py to delegate to shared clients:
    • sync: _SyncBlobOpsClient / _SyncMultipartClient with a single iter_coroutine(...) boundary
    • async: _AsyncBlobOpsClient / _AsyncMultipartClient with direct await
  • Unified transport usage through vercel._http:
    • added RawBody support for pass-through streaming bodies
    • switched transport send path to build_request + send, enabling consistent stream and follow_redirects handling
  • Reworked multipart upload orchestration into runtime-specific executors:
    • sync runtime uses threadpool
    • async runtime uses asyncio tasks
    • shared validation/result shaping/order guarantees

High-level call flow

  • put(...) / put_async(...)
    -> blob.ops wrapper
    -> _SyncBlobOpsClient._put_blob / _AsyncBlobOpsClient._put_blob
    -> single-part: _request_api(...)
    -> multipart: create upload -> upload parts via runtime -> complete upload
    -> request_api_core(...)
    -> _http transport send
  • get(...) / download_file(...) (and async variants)
    -> resolve path via head when needed
    -> _http transport GET (optionally streamed)
    -> return bytes / write file with progress callbacks

Validation

  • Added integration coverage for:
    • sync/async multipart runtime flow parity
    • sync/async blob read + download flows (including progress callbacks)
    • transport RawBody behavior
  • Added parity type check for iterator return types (iter_objects vs iter_objects_async).

@scotttrinh scotttrinh requested review from a team and Copilot February 10, 2026 21:39
@vercel
Copy link

vercel bot commented Feb 10, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
vercel-py Ready Ready Preview Feb 23, 2026 9:39pm

Request Review

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR successfully migrates the Vercel Blob SDK to an async-first architecture using iter_coroutine and unified HTTP transports. The refactoring consolidates duplicate sync/async implementations into a shared async core (src/vercel/blob/_core.py) while maintaining stable public APIs through thin sync wrappers.

Changes:

  • Introduced _SyncBlobOpsClient and _AsyncBlobOpsClient classes with shared _BaseBlobOpsClient base containing core blob operations (put, delete, head, list, iter, copy, create_folder)
  • Refactored multipart uploads to use runtime-specific executors: _SyncMultipartUploadRuntime (ThreadPoolExecutor) and _AsyncMultipartUploadRuntime (asyncio tasks) with shared orchestration logic
  • Added RawBody transport wrapper for pass-through streaming bodies, enabling consistent request building via build_request + send pattern with support for stream and follow_redirects options
  • Documented the iter-coroutine + base/runtime migration pattern in AGENTS.md for future refactoring efforts

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/vercel/blob/_core.py New 925-line async-first core with _BlobRequestClient, _BaseBlobOpsClient, and concrete sync/async clients; includes error mapping, request execution, and all blob operations
src/vercel/blob/ops.py Refactored public APIs to delegate to client classes via _run_sync_blob_operation wrapper and async context managers; get and download_file use direct transport access for storage URLs
src/vercel/blob/api.py Simplified request_api and request_api_async to thin wrappers over request_api_core
src/vercel/blob/multipart/uploader.py Extracted runtime-specific upload logic into _SyncMultipartUploadRuntime and _AsyncMultipartUploadRuntime classes with shared helpers
src/vercel/blob/multipart/core.py Introduced _BaseMultipartClient with shared multipart operations, _SyncMultipartClient and _AsyncMultipartClient implementations
src/vercel/blob/multipart/api.py Refactored to use client classes instead of direct function calls, consolidated validation/normalization helpers
src/vercel/_http/transport.py Added RawBody dataclass and updated transport send methods to support follow_redirects and stream parameters via build_request + send pattern
src/vercel/_http/__init__.py Exported RawBody for use in blob operations
tests/test_sync_async_parity.py Added iterator type parity test for iter_objects / iter_objects_async
tests/integration/test_http_transport_raw_body.py New test file validating RawBody behavior for sync/async iterables
tests/integration/test_blob_sync_async.py Added comprehensive integration tests for multipart flows, get/download_file operations with progress callbacks, and pagination with limit-aware batching
tests/integration/test_blob_multipart_auto_upload.py New test file covering auto multipart upload flows for sync/async runtimes, manual multipart operations, and unknown-total progress handling
AGENTS.md Documented iter-coroutine + base/runtime migration pattern with core principles, recommended structure, guardrails, and testing expectations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 629 to 642
except Exception:
try:
if os.path.exists(tmp):
os.remove(tmp)
finally:
if response is not None:
response.close()
transport.close()
raise
else:
if response is not None:
response.close()
transport.close()
return dst
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resource cleanup logic has a critical issue: when an exception occurs, the code tries to clean up resources in the finally block of the inner try, but then raises the exception. This means the else block (lines 638-641) will never execute, leaving resources unclosed on the success path. The transport and response should be closed in a finally block that executes regardless of whether an exception occurred or not. Consider restructuring to ensure resources are always cleaned up.

Copilot uses AI. Check for mistakes.
Comment on lines 701 to 714
except Exception:
try:
if os.path.exists(tmp):
os.remove(tmp)
finally:
if response is not None:
await response.aclose()
await transport.aclose()
raise
else:
if response is not None:
await response.aclose()
await transport.aclose()
return dst
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resource cleanup logic has the same critical issue as in the sync version: when an exception occurs, resources are cleaned up in the finally block of the inner try, then the exception is raised. This means the else block (lines 710-713) will never execute, leaving resources unclosed on the success path. The transport and response should be closed in a finally block that executes regardless of whether an exception occurred.

Copilot uses AI. Check for mistakes.

def create_sync_multipart_upload_runtime() -> _SyncMultipartUploadRuntime:
return _SyncMultipartUploadRuntime()

Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing blank line between function definitions. PEP 8 requires two blank lines before a top-level function definition. There should be a blank line between line 305 and 306.

Suggested change

Copilot uses AI. Check for mistakes.
@scotttrinh scotttrinh changed the base branch from base-iter-coroutine to main February 10, 2026 22:13
@socket-security
Copy link

socket-security bot commented Feb 10, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedrespx@​0.22.0100100100100100

View full report

Instead of the very permissive decoding we had before, use explicit
decoding behavior with corresponding specific exceptions. This is _kind
of_ a breaking change since before if we failed to parse JSON, we
returned a str that would fail later with a KeyError or some other
runtime error, and now we fail earlier with a more specific error.
Not a full-collapse since the public API seems to really want to be
called functionally, but moved the client instantiation "up" a bit to
avoid making a client and transport on _every_ separate part.

Didn't go with a ContextVar thing due to concerns around the behavior of
sync in worker threads.

Note: This is somewhat a breaking change if anyone was ever importing
directly from the internal `vercel.blob.multipart.core` module. I didn't
bother re-wrapping since this really seems like an internal
implementation detail even though it was publically accessible.
# Conflicts:
#	src/vercel/blob/multipart/api.py
#	src/vercel/blob/multipart/uploader.py
#	src/vercel/blob/ops.py
Delete _parse_last_modified, _build_cache_bypass_url,
_build_get_result, and _resolve_blob_url which were dead code
after the iter-coroutine refactor moved all get/download paths
to _core.py. Remove their now-unused imports and update tests
to import _parse_last_modified from _core instead.
get_async() in ops.py used default_timeout=120.0 while all other
call sites (get(), BlobClient.get(), AsyncBlobClient.get()) used
30.0. Align to 30.0 everywhere.
@elprans
Copy link

elprans commented Feb 23, 2026

@scotttrinh , should we move to the _internal package approach to cleanly delineate internals from public API?

These methods cannot delegate to a Base through iter_coroutine since
they require different internal usage of generators to iterate through
pages and objects. Instead of defining the async version on the base and
then having to override in the sync version, just completely inline this
into each separate client.
Add explicit return types to BlobClient.create_multipart_uploader
(-> MultipartUploader) and AsyncBlobClient.create_multipart_uploader
(-> AsyncMultipartUploader).
Since b877b55, AsyncBlobClient holds a persistent httpx
transport bound to one event loop. The example was calling
asyncio.run() twice with the same client, causing a
"Event loop is closed" error on the second call. Combine
both async examples into a single asyncio.run().
@scotttrinh
Copy link
Collaborator Author

@elprans

@scotttrinh , should we move to the _internal package approach to cleanly delineate internals from public API?

I'm ok with that. The "public" API is pretty broad right now, but all of these new _ modules could move into an _internal package, sure! What does that look like? Would each submodule (like vercel.blob and vercel.blob.multipart) have an _internal module within it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants