fix: adapt SDK ProxyConfiguration to crawlee v4 API#596
Open
B4nan wants to merge 8 commits intofix/storage-client-v4-adaptfrom
Open
fix: adapt SDK ProxyConfiguration to crawlee v4 API#596B4nan wants to merge 8 commits intofix/storage-client-v4-adaptfrom
B4nan wants to merge 8 commits intofix/storage-client-v4-adaptfrom
Conversation
Crawlee v4 reshaped `ProxyConfiguration`: - `newProxyInfo` and `newUrl` now take a single `TieredProxyOptions` argument; the previous `(sessionId, options)` pair is gone. - The protected `_handleCustomUrl(sessionId)` helper was removed; the `_callNewUrlFunction` and `_handleTieredUrl` helpers now take options only. - `ProxyInfo` (in `@crawlee/types`) no longer carries `sessionId`. Changes: - `newProxyInfo` and `newUrl` accept `string | number | TieredProxyOptions | undefined` so existing SDK callers that pass a raw `sessionId` keep working, while the override remains compatible with crawlee's v4 signature. A small `parseSessionIdOrOptions` helper discriminates and pulls `sessionId` from `options.request` when no explicit one is given. - Inlined custom-URL session stickiness via a new private `getSessionIndex(sessionId)` (replacing the removed `_handleCustomUrl`), keyed on `usedProxyUrls` like the base class. - Re-declared `sessionId?: string` on the SDK's `ProxyInfo` interface so users can still read `proxyInfo.sessionId` (v3 carried it on the base type). - Re-imported `ProxyInfo` from `@crawlee/types` (no longer re-exported from `@crawlee/core`). - Tightened a `proxyUrls.some(url => url.includes(...))` access for the new `(string | null)[]` array shape. Stacked on #583 (config redesign); rebases onto v4 once that lands.
- Custom URL rotation: post-increment the round-robin index so the
first sessionless call returns proxyUrls[0] (was off-by-one).
- Surface `username` on the returned ProxyInfo by parsing it out of
the resolved URL — v3 carried it via `super.newProxyInfo`.
- parseSessionIdOrOptions now rejects non-plain objects (e.g. Date,
Array) so `newUrl(new Date())` throws as users expect.
test: `newUrl({})` is no longer 'invalid' — empty TieredProxyOptions
is a legal v4 call shape; documented the carve-out.
…oxyInfo shape
- newUrl/newProxyInfo accept an optional second `legacyOptions`
argument so existing callers that pass `(sessionId, {request})`
keep working under the v4 shape too.
- Returned ProxyInfo omits Apify-only fields (groups, countryCode)
when not using Apify Proxy and only includes `proxyTier` when
defined — matches v3's strict-deep-equal expectations.
…nfiguration tests - ProxyInfo.username is now the decoded form (`user@name` rather than `user%40name`), matching v3 behaviour and the test expectations. - Added a beforeEach to the `Actor.createProxyConfiguration()` describe that resets serviceLocator + Configuration.globalConfig + Actor._instance so each test sees the env vars it sets.
crawlee v4 (apify/crawlee#3599, beta.51) removed `tieredProxyUrls`, `tieredProxyConfig`, `_handleTieredUrl`, and `proxyTier` from `ProxyConfiguration` / `ProxyInfo`. The SDK's wrapper used to thread those through to the base class; with the upstream API gone, that plumbing has to go too. - Remove the `tieredProxyConfig` field from the SDK's `ProxyConfigurationOptions`. - Drop the constructor branch that forwarded `tieredProxyUrls` / `tieredProxyConfig` to the base class and the now-unreachable `_generateTieredProxyUrls` helper. - Drop the `tieredProxyUrls` short-circuit and `proxyTier` field from `newUrl` / `newProxyInfo`. - Drop the corresponding test groups in `proxy_configuration.test.ts`.
b490925 to
4f718b5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Crawlee v4 reshaped
ProxyConfiguration:newProxyInfo/newUrlnow take a singleTieredProxyOptionsargument; the(sessionId, options)pair is gone._handleCustomUrl(sessionId)helper was removed._callNewUrlFunction/_handleTieredUrltake options only.ProxyInfo(in@crawlee/types) no longer carriessessionId.This PR adapts the SDK's override:
newProxyInfoandnewUrlacceptstring | number | TieredProxyOptions | undefined— existing SDK callers that pass a rawsessionIdkeep working, and the override is also compatible with crawlee's v4 single-options signature. A smallparseSessionIdOrOptionshelper discriminates and pullssessionIdfromoptions.requestwhen no explicit one is given.getSessionIndex(sessionId)(replacing the removed_handleCustomUrl), keyed on the inheritedusedProxyUrlsmap.sessionId?: stringon the SDK'sProxyInfointerface so users can keep readingproxyInfo.sessionId.ProxyInfois now imported from@crawlee/types(no longer re-exported from@crawlee/core)..some(url => url.includes(...))for the new(string | null)[]shape.Stacking
Depends on #583 (config redesign). Rebases cleanly onto v4 once that lands.