fix: adapt SDK storage layer to crawlee v4 StorageClient interface#595
fix: adapt SDK storage layer to crawlee v4 StorageClient interface#595B4nan wants to merge 8 commits intofix/event-manager-v4-adaptfrom
Conversation
Crawlee v4 reshaped its `StorageClient` interface (async factory methods that accept `id` *or* `name`), removed the cached `storageObject` from `KeyValueStore`, and made `getPublicUrl` async. The existing SDK code targeted the v3 shape and no longer compiles. Changes: - New `ApifyStorageClient` adapter wraps `apify-client`'s legacy `dataset()/keyValueStore()/requestQueue()` accessors and exposes the `createDatasetClient/createKeyValueStoreClient/createRequestQueueClient` factories crawlee now expects. Names are resolved to IDs via the collection `getOrCreate(name)` calls. apify-client's resource clients don't yet implement v4-only members like `getMetadata` / `getRecordPublicUrl`; the adapter casts through with a TODO comment so the structural alignment can land separately upstream. - `Actor.init` and `_openStorage` now wrap `this.apifyClient` in `ApifyStorageClient` before handing it to crawlee. - `KeyValueStore.getPublicUrl` is now async; the per-store `urlSigningSecretKey` is fetched on demand via the (private) `client.getMetadata()` instead of the removed `storageObject` cache. URL-signing behaviour for platform-mode reads is preserved. - `Actor.openRequestQueue` reads `totalRequestCount` via the new `client.getMetadata()` (the old `client.get()` was dropped). - `StorageManager.openStorage` is now `(class, id?, client?)` — removed the trailing `this.config` argument. Stacked on #583 (config redesign); rebases onto v4 once that lands.
Replace the removed `StorageManager.clearCache()` and `Configuration.useStorageClient()` with `serviceLocator.reset()` plus `serviceLocator.setStorageClient()`.
…n emulator init/destroy
…apter - `openRequestQueue should open storage`: mock client uses `getMetadata()` (the v3 `get()` was dropped on RequestQueueClient). - Both Storage API tests assert that StorageManager.openStorage is called with an ApifyStorageClient (matched structurally) instead of the raw ApifyClient — the SDK now wraps it for crawlee v4.
Actor.init() calls Configuration.storage.enterWith(this.config), which sticks the resolved config onto the current async context and persists across tests on Node 22 (but not Node 24+). The cached value short- circuits Configuration.getGlobalConfig() so subsequent tests never see the env vars they just set. Reset the AsyncLocalStorage value alongside the other singletons in the test emulator so addWebhook (and friends) see ACTOR_RUN_ID etc.
…eset `Actor.init()` calls `Configuration.storage.enterWith(this.config)`, which sets the AsyncLocalStorage value on whichever async context the test runner happened to be on. `enterWith(undefined)` from a child async branch (vitest's beforeEach) doesn't unwind that — on Node 22 the test body re-enters a sibling context where the original `enterWith` is still in effect, so `getStore()` still returns the stale Configuration even after our reset. Swapping the entire `AsyncLocalStorage` instance for a fresh one guarantees `getStore()` returns `undefined` for every async branch that follows, fixing the addWebhook test failures on Node 22.
fba9532 to
777f5d6
Compare
barjin
left a comment
There was a problem hiding this comment.
A few ideas, but lgtm overall. Thanks!
@janbuchar might have more ideas as the original author
| const id = | ||
| options?.id ?? | ||
| (options?.name | ||
| ? (await this.client.datasets().getOrCreate(options.name)).id | ||
| : undefined); |
There was a problem hiding this comment.
Note that if we implement storageExists, Crawlee should be able to resolve the identifiers on its own(?)
| * implementation (which produces a `file://` URL or returns `undefined`). | ||
| */ | ||
| override getPublicUrl(key: string): string { | ||
| override async getPublicUrl(key: string): Promise<string | undefined> { |
There was a problem hiding this comment.
Maybe this change closes this issue - I'm not sure if the renaming was a requirement, or a way to make this BC.
| // `client` is `private` on `CoreKeyValueStore`; bypass the visibility | ||
| // check to fetch the per-store secret. There is no public crawlee API | ||
| // surface for this yet — track upstream exposure as a follow-up. |
There was a problem hiding this comment.
Good point, please create the issue in Crawlee (I see that Dataset and RequestProvider already both have client public, so it's weirdly asymmetrical now).
|
@B4nan did you check parity with apify-sdk-python? I assume that there will be gaps in the request queue implementation, which is fine at this point. |
Summary
Crawlee v4 reshaped the
StorageClientinterface, removed the cachedstorageObjectfromKeyValueStore, and madegetPublicUrlasync. The SDK still targeted the v3 shape and no longer compiled. This PR adapts:ApifyStorageClientadapter wrappingapify-client's legacydataset()/keyValueStore()/requestQueue()accessors and exposing the factory methods (createDatasetClient/createKeyValueStoreClient/createRequestQueueClient) crawlee now requires. Name → ID resolution goes through each collection'sgetOrCreate(name).Actor.initand_openStoragenow wrapthis.apifyClientin the adapter before handing it to crawlee/StorageManager.KeyValueStore.getPublicUrlis nowasync; the per-storeurlSigningSecretKeyis fetched on demand via the (private)client.getMetadata()rather than the removedstorageObjectcache. URL-signing behaviour in platform mode is preserved.Actor.openRequestQueuereadstotalRequestCountviaclient.getMetadata()(the oldclient.get()is gone).StorageManager.openStoragesignature is now(class, id?, client?)— dropped the trailingthis.configargument.Known gaps tracked in this PR
apify-client'sDatasetClient/KeyValueStoreClient/RequestQueueClientdon't yet implement v4-added members likegetMetadataandgetRecordPublicUrl. The adapter casts through withas unknown as ...for now — the proper fix is to bringapify-client's resource client shapes into structural alignment with@crawlee/types, which is out of scope here.The
KeyValueStoreInfotype in@crawlee/typesalso doesn't declareurlSigningSecretKey, so the SDK uses a local intersection type to access it. Worth surfacing upstream eventually.Stacking
Depends on #583 (config redesign). Rebases cleanly onto v4 once that lands.