-
Notifications
You must be signed in to change notification settings - Fork 63
fix: adapt SDK storage layer to crawlee v4 StorageClient interface #595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: fix/event-manager-v4-adapt
Are you sure you want to change the base?
Changes from all commits
1132adb
14a75c2
8407f2f
04885e6
a099922
aa740ef
40365e6
e56ccdf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,68 @@ | ||
| import type { | ||
| CreateDatasetClientOptions, | ||
| CreateKeyValueStoreClientOptions, | ||
| CreateRequestQueueClientOptions, | ||
| DatasetClient, | ||
| KeyValueStoreClient, | ||
| RequestQueueClient, | ||
| StorageClient, | ||
| } from '@crawlee/types'; | ||
| import type { ApifyClient } from 'apify-client'; | ||
|
|
||
| /** | ||
| * Bridges `apify-client`'s synchronous resource accessors (`dataset(id)`, | ||
| * `keyValueStore(id)`, `requestQueue(id, options?)`) to crawlee v4's | ||
| * `StorageClient` interface (async factory methods accepting either an `id` | ||
| * or a `name`). | ||
| * | ||
| * When only a `name` is provided, we resolve it to a concrete ID via the | ||
| * collection client's `getOrCreate(name)` — matching the behaviour the SDK | ||
| * relied on in v3 when storages were opened by name. | ||
| */ | ||
| export class ApifyStorageClient implements StorageClient { | ||
| constructor(private readonly client: ApifyClient) {} | ||
|
|
||
| async createDatasetClient( | ||
| options?: CreateDatasetClientOptions, | ||
| ): Promise<DatasetClient> { | ||
| const id = | ||
| options?.id ?? | ||
| (options?.name | ||
| ? (await this.client.datasets().getOrCreate(options.name)).id | ||
| : undefined); | ||
| // apify-client's resource clients overlap with `@crawlee/types`' shapes | ||
| // but don't yet implement the v4-added members (`getMetadata`, | ||
| // `getRecordPublicUrl`). Cast through for now; a follow-up should | ||
| // bring apify-client into structural alignment. | ||
| return this.client.dataset(id ?? '') as unknown as DatasetClient; | ||
| } | ||
|
|
||
| async createKeyValueStoreClient( | ||
| options?: CreateKeyValueStoreClientOptions, | ||
| ): Promise<KeyValueStoreClient> { | ||
| const id = | ||
| options?.id ?? | ||
| (options?.name | ||
| ? (await this.client.keyValueStores().getOrCreate(options.name)) | ||
| .id | ||
| : undefined); | ||
| return this.client.keyValueStore( | ||
| id ?? '', | ||
| ) as unknown as KeyValueStoreClient; | ||
| } | ||
|
|
||
| async createRequestQueueClient( | ||
| options?: CreateRequestQueueClientOptions, | ||
| ): Promise<RequestQueueClient> { | ||
| const id = | ||
| options?.id ?? | ||
| (options?.name | ||
| ? (await this.client.requestQueues().getOrCreate(options.name)) | ||
| .id | ||
| : undefined); | ||
| return this.client.requestQueue( | ||
| id ?? '', | ||
| options?.clientKey ? { clientKey: options.clientKey } : undefined, | ||
| ) as unknown as RequestQueueClient; | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,12 +1,18 @@ | ||
| import type { StorageManagerOptions } from '@crawlee/core'; | ||
| import { KeyValueStore as CoreKeyValueStore } from '@crawlee/core'; | ||
| import type { KeyValueStoreInfo } from '@crawlee/types'; | ||
|
|
||
| import { createHmacSignature } from '@apify/utilities'; | ||
|
|
||
| import type { Configuration } from './configuration.js'; | ||
|
|
||
| // @ts-ignore newer crawlee versions already declare this method in core | ||
| const { getPublicUrl } = CoreKeyValueStore.prototype; | ||
| // crawlee v4 dropped the `storageObject` cache from `KeyValueStore`, so the | ||
| // per-store `urlSigningSecretKey` (which is part of the platform's metadata | ||
| // response but not declared on `@crawlee/types`' `KeyValueStoreInfo`) has to | ||
| // be fetched on demand and accessed through a structural-typed augmentation. | ||
| type ApifyKeyValueStoreInfo = KeyValueStoreInfo & { | ||
| urlSigningSecretKey?: string; | ||
| }; | ||
|
|
||
| /** | ||
| * @inheritDoc | ||
|
|
@@ -15,24 +21,35 @@ export class KeyValueStore extends CoreKeyValueStore { | |
| /** | ||
| * Returns a URL for the given key that may be used to publicly | ||
| * access the value in the remote key-value store. | ||
| * | ||
| * On the Apify platform the URL is signed with the store's | ||
| * `urlSigningSecretKey` so that anyone with the URL can read the record | ||
| * without authentication. Locally we delegate to crawlee's default | ||
| * implementation (which produces a `file://` URL or returns `undefined`). | ||
| */ | ||
| override getPublicUrl(key: string): string { | ||
| override async getPublicUrl(key: string): Promise<string | undefined> { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe this change closes this issue - I'm not sure if the renaming was a requirement, or a way to make this BC. |
||
| const config = this.config as Configuration; | ||
| if (!config.isAtHome && getPublicUrl) { | ||
| return getPublicUrl.call(this, key); | ||
| if (!config.isAtHome) { | ||
| return super.getPublicUrl(key); | ||
| } | ||
|
|
||
| const publicUrl = new URL( | ||
| `${config.apiPublicBaseUrl}/v2/key-value-stores/${this.id}/records/${key}`, | ||
| ); | ||
|
|
||
| if (this.storageObject?.urlSigningSecretKey) { | ||
| // `client` is `private` on `CoreKeyValueStore`; bypass the visibility | ||
| // check to fetch the per-store secret. There is no public crawlee API | ||
| // surface for this yet — track upstream exposure as a follow-up. | ||
|
Comment on lines
+40
to
+42
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point, please create the issue in Crawlee (I see that |
||
| const metadata = (await ( | ||
| this as unknown as { | ||
| client: { getMetadata(): Promise<KeyValueStoreInfo> }; | ||
| } | ||
| ).client.getMetadata()) as ApifyKeyValueStoreInfo; | ||
|
|
||
| if (metadata?.urlSigningSecretKey) { | ||
| publicUrl.searchParams.append( | ||
| 'signature', | ||
| createHmacSignature( | ||
| this.storageObject.urlSigningSecretKey as string, | ||
| key, | ||
| ), | ||
| createHmacSignature(metadata.urlSigningSecretKey, key), | ||
| ); | ||
| } | ||
|
|
||
|
|
@@ -49,6 +66,3 @@ export class KeyValueStore extends CoreKeyValueStore { | |
| return super.open(storeIdOrName, options) as unknown as KeyValueStore; | ||
| } | ||
| } | ||
|
|
||
| // @ts-ignore newer crawlee versions already declare this method in core | ||
| CoreKeyValueStore.prototype.getPublicUrl = KeyValueStore.prototype.getPublicUrl; | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that if we implement
storageExists, Crawlee should be able to resolve the identifiers on its own(?)