chore: bundle v4 catch-up — config redesign + EventManager + StorageClient + ProxyConfiguration#597
Open
chore: bundle v4 catch-up — config redesign + EventManager + StorageClient + ProxyConfiguration#597
Conversation
Refactor the SDK Configuration class to match the new crawlee core
Configuration redesign:
- Subclass core Configuration using `protected static override fields`
- Direct property access (`config.token`) instead of `config.get('token')`
- Immutable: values set via constructor, no `set()` method
- Priority: constructor options > env vars > schema defaults
- isAtHome conditional defaults moved into field definitions
- Use serviceLocator instead of config.useStorageClient/getEventManager
- Import z, coerceNumber, coerceBoolean from @crawlee/core (no direct zod dep)
- Update all .get()/.set() call sites in actor.ts, charging.ts, etc.
- Update tests to use property access
Depends on crawlee PR: apify/crawlee#3474
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Import `z` from `zod` directly (no longer re-exported from crawlee core) - Define `coerceNumber` locally (no longer exported from crawlee core) - Add constructor override to accept `ApifyConfigurationInput` - Import `ConfigurationOptions` from SDK configuration instead of core - Fix test that mutated env vars after init (immutable config) Depends on: apify/crawlee#3080 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Restore the destructuring of `storageDir` and spread of remaining `storageClientOptions` into the `ApifyClient` constructor so that arbitrary client options configured via `storageClientOptions` continue to reach the client. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…gration - Reuse `coerceNumber` from `@crawlee/core` instead of defining a local copy; otherwise `FieldsOutput<typeof apifyConfigFields>` produces a structurally distinct (but equivalent) `availableMemoryRatio` type that breaks declaration-merging with crawlee's `Configuration`. - Drop the dead `storageClientOptions`/`storageDir` destructuring in `Actor.newClient()` — neither key exists in the redesigned Configuration; `options` already covers the override path. The remaining build errors (proxy/storage/event drift) are unrelated to the config redesign and tracked in separate follow-up PRs against the v4 branch.
Crawlee v4's `EventManager` constructor now requires `EventManagerOptions` (just `persistStateIntervalMillis`), and the base class no longer carries a `config` field — the previous `override readonly config` pattern is no longer valid. - Drop the `override` and store `config` as own readonly property. - Forward `persistStateIntervalMillis` to `super()`. - Add a `fromConfig()` factory mirroring `LocalEventManager.fromConfig()` so the SDK plays nicely with the new ServiceLocator-driven init path. Stacked on #583 (config redesign); rebases onto v4 once that lands.
Crawlee v4 reshaped its `StorageClient` interface (async factory methods that accept `id` *or* `name`), removed the cached `storageObject` from `KeyValueStore`, and made `getPublicUrl` async. The existing SDK code targeted the v3 shape and no longer compiles. Changes: - New `ApifyStorageClient` adapter wraps `apify-client`'s legacy `dataset()/keyValueStore()/requestQueue()` accessors and exposes the `createDatasetClient/createKeyValueStoreClient/createRequestQueueClient` factories crawlee now expects. Names are resolved to IDs via the collection `getOrCreate(name)` calls. apify-client's resource clients don't yet implement v4-only members like `getMetadata` / `getRecordPublicUrl`; the adapter casts through with a TODO comment so the structural alignment can land separately upstream. - `Actor.init` and `_openStorage` now wrap `this.apifyClient` in `ApifyStorageClient` before handing it to crawlee. - `KeyValueStore.getPublicUrl` is now async; the per-store `urlSigningSecretKey` is fetched on demand via the (private) `client.getMetadata()` instead of the removed `storageObject` cache. URL-signing behaviour for platform-mode reads is preserved. - `Actor.openRequestQueue` reads `totalRequestCount` via the new `client.getMetadata()` (the old `client.get()` was dropped). - `StorageManager.openStorage` is now `(class, id?, client?)` — removed the trailing `this.config` argument. Stacked on #583 (config redesign); rebases onto v4 once that lands.
Crawlee v4 reshaped `ProxyConfiguration`: - `newProxyInfo` and `newUrl` now take a single `TieredProxyOptions` argument; the previous `(sessionId, options)` pair is gone. - The protected `_handleCustomUrl(sessionId)` helper was removed; the `_callNewUrlFunction` and `_handleTieredUrl` helpers now take options only. - `ProxyInfo` (in `@crawlee/types`) no longer carries `sessionId`. Changes: - `newProxyInfo` and `newUrl` accept `string | number | TieredProxyOptions | undefined` so existing SDK callers that pass a raw `sessionId` keep working, while the override remains compatible with crawlee's v4 signature. A small `parseSessionIdOrOptions` helper discriminates and pulls `sessionId` from `options.request` when no explicit one is given. - Inlined custom-URL session stickiness via a new private `getSessionIndex(sessionId)` (replacing the removed `_handleCustomUrl`), keyed on `usedProxyUrls` like the base class. - Re-declared `sessionId?: string` on the SDK's `ProxyInfo` interface so users can still read `proxyInfo.sessionId` (v3 carried it on the base type). - Re-imported `ProxyInfo` from `@crawlee/types` (no longer re-exported from `@crawlee/core`). - Tightened a `proxyUrls.some(url => url.includes(...))` access for the new `(string | null)[]` array shape. Stacked on #583 (config redesign); rebases onto v4 once that lands.
…ent cases Crawlee v4's Configuration resolves env vars eagerly at construction, so the existing 'Actor.newClient() reads environment variables correctly' test reads stale values once a prior test or import-time side effect has already created the singleton. Reset both before each case.
`Configuration.useEventManager()` was removed in crawlee v4. Install the platform event manager via the global service locator instead, and reset between tests so each case can register a fresh manager without hitting `ServiceConflictError`.
Replace the removed `StorageManager.clearCache()` and `Configuration.useStorageClient()` with `serviceLocator.reset()` plus `serviceLocator.setStorageClient()`.
Crawlee v4's `Configuration` is eager — `actorEventsWsUrl` is read once at construction, so a global config that pre-existed the `beforeEach` would never see the websocket URL we set, and `events.init()` would silently never connect. Move the env-var setup above `Configuration.getGlobalConfig()` and reset the SDK's static singleton so each test rebuilds a fresh config.
The SDK's `Configuration` keeps its own static singleton separate from crawlee's serviceLocator. Resetting only the locator wasn't enough — `Configuration.getGlobalConfig()` still handed back the stale cached SDK config (which was built before the test set `APIFY_TOKEN`).
b493549 to
b9ed56b
Compare
b9ed56b to
f27efdd
Compare
f27efdd to
53477ed
Compare
…n emulator init/destroy
…apter - `openRequestQueue should open storage`: mock client uses `getMetadata()` (the v3 `get()` was dropped on RequestQueueClient). - Both Storage API tests assert that StorageManager.openStorage is called with an ApifyStorageClient (matched structurally) instead of the raw ApifyClient — the SDK now wraps it for crawlee v4.
- Reword "empty string maxTotalChargeUsd" assertion: under Option A the empty env var is now treated as unset, so `config.maxTotalChargeUsd` is `undefined` (charging manager still defaults to Infinity). - Actor.getInput tests now build a fresh Actor *after* setting the env vars they exercise — eager config resolution means a single module-scoped TestingActor would carry stale values.
- Custom URL rotation: post-increment the round-robin index so the
first sessionless call returns proxyUrls[0] (was off-by-one).
- Surface `username` on the returned ProxyInfo by parsing it out of
the resolved URL — v3 carried it via `super.newProxyInfo`.
- parseSessionIdOrOptions now rejects non-plain objects (e.g. Date,
Array) so `newUrl(new Date())` throws as users expect.
test: `newUrl({})` is no longer 'invalid' — empty TieredProxyOptions
is a legal v4 call shape; documented the carve-out.
…oxyInfo shape
- newUrl/newProxyInfo accept an optional second `legacyOptions`
argument so existing callers that pass `(sessionId, {request})`
keep working under the v4 shape too.
- Returned ProxyInfo omits Apify-only fields (groups, countryCode)
when not using Apify Proxy and only includes `proxyTier` when
defined — matches v3's strict-deep-equal expectations.
…nfiguration tests - ProxyInfo.username is now the decoded form (`user@name` rather than `user%40name`), matching v3 behaviour and the test expectations. - Added a beforeEach to the `Actor.createProxyConfiguration()` describe that resets serviceLocator + Configuration.globalConfig + Actor._instance so each test sees the env vars it sets.
53477ed to
fe64186
Compare
Crawlee's Configuration uses crawleeConfigFields and only knows about `CRAWLEE_INPUT_KEY`. The SDK extension adds `ACTOR_INPUT_KEY` / `APIFY_INPUT_KEY` env-var aliases, which the test relies on. Importing Configuration from 'apify' makes `new Configuration()` inside buildActor() resolve those env vars correctly.
fe64186 to
09fdf11
Compare
`@crawlee/linkedom@4.0.0-beta.49`'s `linkedom-crawler.js` imports `cheerio` without declaring it as a dependency. Locally this works when a parent directory has cheerio installed; CI's fresh install fails. Adding it directly here keeps tests green until the upstream package fixes the missing dep declaration.
9bb7058 to
5b6598a
Compare
Actor.init() calls Configuration.storage.enterWith(this.config), which sticks the resolved config onto the current async context and persists across tests on Node 22 (but not Node 24+). The cached value short- circuits Configuration.getGlobalConfig() so subsequent tests never see the env vars they just set. Reset the AsyncLocalStorage value alongside the other singletons in the test emulator so addWebhook (and friends) see ACTOR_RUN_ID etc.
856d554 to
5b6598a
Compare
B4nan
added a commit
to apify/crawlee
that referenced
this pull request
Apr 30, 2026
## Summary `packages/linkedom-crawler/src/internals/linkedom-crawler.ts` imports `cheerio` (`import * as cheerio from 'cheerio'`) but `@crawlee/linkedom`'s `package.json` doesn't list it as a dependency. It works inside the monorepo because cheerio is hoisted to the workspace root via other packages (`@crawlee/cheerio`, `@crawlee/utils`, `@crawlee/http`, …), so Node always finds it. **Downstream installs that depend only on `crawlee`** (which re-exports `@crawlee/linkedom`) **and don't pull any cheerio-using sibling** fail at runtime: ``` Error: Cannot find package 'cheerio' imported from .../node_modules/@crawlee/linkedom/internals/linkedom-crawler.js ``` This bit the apify-sdk-js v4 catch-up PRs (apify/apify-sdk-js#597) on a clean CI install — without this fix, every consumer has to ship a `cheerio` dev-dep workaround. The fix is one-line: declare `cheerio: "^1.0.0"` (matching what `@crawlee/cheerio` already pins).
`@crawlee/linkedom@4.0.0-beta.51` now declares cheerio as a direct dependency (apify/crawlee#3620), so the SDK no longer has to ship its own cheerio devDep to mask the missing declaration.
e10479c to
fda6873
Compare
crawlee v4 (apify/crawlee#3599, beta.51) removed `tieredProxyUrls`, `tieredProxyConfig`, `_handleTieredUrl`, and `proxyTier` from `ProxyConfiguration` / `ProxyInfo`. The SDK's wrapper used to thread those through to the base class; with the upstream API gone, that plumbing has to go too. - Remove the `tieredProxyConfig` field from the SDK's `ProxyConfigurationOptions`. - Drop the constructor branch that forwarded `tieredProxyUrls` / `tieredProxyConfig` to the base class and the now-unreachable `_generateTieredProxyUrls` helper. - Drop the `tieredProxyUrls` short-circuit and `proxyTier` field from `newUrl` / `newProxyInfo`. - Drop the corresponding test groups in `proxy_configuration.test.ts`.
fda6873 to
d649a33
Compare
…eset `Actor.init()` calls `Configuration.storage.enterWith(this.config)`, which sets the AsyncLocalStorage value on whichever async context the test runner happened to be on. `enterWith(undefined)` from a child async branch (vitest's beforeEach) doesn't unwind that — on Node 22 the test body re-enters a sibling context where the original `enterWith` is still in effect, so `getStore()` still returns the stale Configuration even after our reset. Swapping the entire `AsyncLocalStorage` instance for a fresh one guarantees `getStore()` returns `undefined` for every async branch that follows, fixing the addWebhook test failures on Node 22.
…o chore/v4-catchup-bundle
d649a33 to
f0ced4f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this is
A non-mergeable demo branch that bundles all four v4 catch-up PRs so reviewers can see green CI on the combined state.
PlatformEventManagerconstructor adaptStorageClientadapter +KeyValueStore.getPublicUrlasync URL signingProxyConfigurationv4 APIPinned at
crawlee@^4.0.0-beta.51.Recommended merge order: #583 → #594 → #595 → #596
The four focused PRs are rebased into a linear stack:
Each downstream branch already contains its predecessors as ancestors, so the merge order matters mostly for review/CI clarity:
Verified: a sequential merge of #583 → #594 → #595 → #596 into
origin/v4produces zero conflicts at each step. Locally the resulting state passes 75/75 active tests on Node 22 and Node 24. The tree is functionally equivalent to this bundle (sole diff is a stale ~69-linenode_modules/@crawlee/linkedom/node_modules/cheerioblock left in this bundle's lockfile from the cheerio-workaround era — pure regen artifact, not real divergence).Do not merge this PR
Merge the four focused PRs above instead. This branch will be deleted once they land.