Skip to content

fix: adapt SDK storage layer to crawlee v4 StorageClient interface#595

Open
B4nan wants to merge 8 commits intofix/event-manager-v4-adaptfrom
fix/storage-client-v4-adapt
Open

fix: adapt SDK storage layer to crawlee v4 StorageClient interface#595
B4nan wants to merge 8 commits intofix/event-manager-v4-adaptfrom
fix/storage-client-v4-adapt

Conversation

@B4nan
Copy link
Copy Markdown
Member

@B4nan B4nan commented Apr 30, 2026

Summary

Crawlee v4 reshaped the StorageClient interface, removed the cached storageObject from KeyValueStore, and made getPublicUrl async. The SDK still targeted the v3 shape and no longer compiled. This PR adapts:

  • New ApifyStorageClient adapter wrapping apify-client's legacy dataset()/keyValueStore()/requestQueue() accessors and exposing the factory methods (createDatasetClient / createKeyValueStoreClient / createRequestQueueClient) crawlee now requires. Name → ID resolution goes through each collection's getOrCreate(name).
  • Actor.init and _openStorage now wrap this.apifyClient in the adapter before handing it to crawlee/StorageManager.
  • KeyValueStore.getPublicUrl is now async; the per-store urlSigningSecretKey is fetched on demand via the (private) client.getMetadata() rather than the removed storageObject cache. URL-signing behaviour in platform mode is preserved.
  • Actor.openRequestQueue reads totalRequestCount via client.getMetadata() (the old client.get() is gone).
  • StorageManager.openStorage signature is now (class, id?, client?) — dropped the trailing this.config argument.

Known gaps tracked in this PR

apify-client's DatasetClient / KeyValueStoreClient / RequestQueueClient don't yet implement v4-added members like getMetadata and getRecordPublicUrl. The adapter casts through with as unknown as ... for now — the proper fix is to bring apify-client's resource client shapes into structural alignment with @crawlee/types, which is out of scope here.

The KeyValueStoreInfo type in @crawlee/types also doesn't declare urlSigningSecretKey, so the SDK uses a local intersection type to access it. Worth surfacing upstream eventually.

Stacking

Depends on #583 (config redesign). Rebases cleanly onto v4 once that lands.

B4nan added 8 commits April 30, 2026 21:14
Crawlee v4 reshaped its `StorageClient` interface (async factory
methods that accept `id` *or* `name`), removed the cached
`storageObject` from `KeyValueStore`, and made `getPublicUrl` async.
The existing SDK code targeted the v3 shape and no longer compiles.

Changes:
- New `ApifyStorageClient` adapter wraps `apify-client`'s legacy
  `dataset()/keyValueStore()/requestQueue()` accessors and exposes
  the `createDatasetClient/createKeyValueStoreClient/createRequestQueueClient`
  factories crawlee now expects. Names are resolved to IDs via the
  collection `getOrCreate(name)` calls. apify-client's resource
  clients don't yet implement v4-only members like `getMetadata` /
  `getRecordPublicUrl`; the adapter casts through with a TODO
  comment so the structural alignment can land separately upstream.
- `Actor.init` and `_openStorage` now wrap `this.apifyClient` in
  `ApifyStorageClient` before handing it to crawlee.
- `KeyValueStore.getPublicUrl` is now async; the per-store
  `urlSigningSecretKey` is fetched on demand via the (private)
  `client.getMetadata()` instead of the removed `storageObject`
  cache. URL-signing behaviour for platform-mode reads is preserved.
- `Actor.openRequestQueue` reads `totalRequestCount` via the new
  `client.getMetadata()` (the old `client.get()` was dropped).
- `StorageManager.openStorage` is now `(class, id?, client?)` —
  removed the trailing `this.config` argument.

Stacked on #583 (config redesign); rebases onto v4 once that lands.
Replace the removed `StorageManager.clearCache()` and
`Configuration.useStorageClient()` with `serviceLocator.reset()`
plus `serviceLocator.setStorageClient()`.
…apter

- `openRequestQueue should open storage`: mock client uses
  `getMetadata()` (the v3 `get()` was dropped on
  RequestQueueClient).
- Both Storage API tests assert that StorageManager.openStorage is
  called with an ApifyStorageClient (matched structurally) instead of
  the raw ApifyClient — the SDK now wraps it for crawlee v4.
Actor.init() calls Configuration.storage.enterWith(this.config), which
sticks the resolved config onto the current async context and persists
across tests on Node 22 (but not Node 24+). The cached value short-
circuits Configuration.getGlobalConfig() so subsequent tests never see
the env vars they just set.

Reset the AsyncLocalStorage value alongside the other singletons in
the test emulator so addWebhook (and friends) see ACTOR_RUN_ID etc.
…eset

`Actor.init()` calls `Configuration.storage.enterWith(this.config)`,
which sets the AsyncLocalStorage value on whichever async context the
test runner happened to be on. `enterWith(undefined)` from a child
async branch (vitest's beforeEach) doesn't unwind that — on Node 22
the test body re-enters a sibling context where the original
`enterWith` is still in effect, so `getStore()` still returns the
stale Configuration even after our reset.

Swapping the entire `AsyncLocalStorage` instance for a fresh one
guarantees `getStore()` returns `undefined` for every async branch
that follows, fixing the addWebhook test failures on Node 22.
@B4nan B4nan force-pushed the fix/storage-client-v4-adapt branch from fba9532 to 777f5d6 Compare April 30, 2026 19:14
@B4nan B4nan changed the base branch from v4 to fix/event-manager-v4-adapt May 6, 2026 09:26
Copy link
Copy Markdown
Member

@barjin barjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few ideas, but lgtm overall. Thanks!

@janbuchar might have more ideas as the original author

Comment on lines +28 to +32
const id =
options?.id ??
(options?.name
? (await this.client.datasets().getOrCreate(options.name)).id
: undefined);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that if we implement storageExists, Crawlee should be able to resolve the identifiers on its own(?)

* implementation (which produces a `file://` URL or returns `undefined`).
*/
override getPublicUrl(key: string): string {
override async getPublicUrl(key: string): Promise<string | undefined> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this change closes this issue - I'm not sure if the renaming was a requirement, or a way to make this BC.

Comment on lines +40 to +42
// `client` is `private` on `CoreKeyValueStore`; bypass the visibility
// check to fetch the per-store secret. There is no public crawlee API
// surface for this yet — track upstream exposure as a follow-up.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, please create the issue in Crawlee (I see that Dataset and RequestProvider already both have client public, so it's weirdly asymmetrical now).

@janbuchar
Copy link
Copy Markdown
Contributor

@B4nan did you check parity with apify-sdk-python? I assume that there will be gaps in the request queue implementation, which is fine at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants