Skip to content

fix: adapt SDK ProxyConfiguration to crawlee v4 API#596

Open
B4nan wants to merge 8 commits intofix/storage-client-v4-adaptfrom
fix/proxy-configuration-v4-adapt
Open

fix: adapt SDK ProxyConfiguration to crawlee v4 API#596
B4nan wants to merge 8 commits intofix/storage-client-v4-adaptfrom
fix/proxy-configuration-v4-adapt

Conversation

@B4nan
Copy link
Copy Markdown
Member

@B4nan B4nan commented Apr 30, 2026

Summary

Crawlee v4 reshaped ProxyConfiguration:

  • newProxyInfo / newUrl now take a single TieredProxyOptions argument; the (sessionId, options) pair is gone.
  • The protected _handleCustomUrl(sessionId) helper was removed.
  • _callNewUrlFunction / _handleTieredUrl take options only.
  • ProxyInfo (in @crawlee/types) no longer carries sessionId.

This PR adapts the SDK's override:

  • newProxyInfo and newUrl accept string | number | TieredProxyOptions | undefined — existing SDK callers that pass a raw sessionId keep working, and the override is also compatible with crawlee's v4 single-options signature. A small parseSessionIdOrOptions helper discriminates and pulls sessionId from options.request when no explicit one is given.
  • Inlined custom-URL session stickiness as getSessionIndex(sessionId) (replacing the removed _handleCustomUrl), keyed on the inherited usedProxyUrls map.
  • Re-declared sessionId?: string on the SDK's ProxyInfo interface so users can keep reading proxyInfo.sessionId.
  • ProxyInfo is now imported from @crawlee/types (no longer re-exported from @crawlee/core).
  • Tightened a .some(url => url.includes(...)) for the new (string | null)[] shape.

Stacking

Depends on #583 (config redesign). Rebases cleanly onto v4 once that lands.

B4nan added 8 commits April 30, 2026 21:14
Crawlee v4 reshaped `ProxyConfiguration`:
- `newProxyInfo` and `newUrl` now take a single `TieredProxyOptions`
  argument; the previous `(sessionId, options)` pair is gone.
- The protected `_handleCustomUrl(sessionId)` helper was removed; the
  `_callNewUrlFunction` and `_handleTieredUrl` helpers now take options
  only.
- `ProxyInfo` (in `@crawlee/types`) no longer carries `sessionId`.

Changes:
- `newProxyInfo` and `newUrl` accept `string | number |
  TieredProxyOptions | undefined` so existing SDK callers that pass a
  raw `sessionId` keep working, while the override remains compatible
  with crawlee's v4 signature. A small `parseSessionIdOrOptions`
  helper discriminates and pulls `sessionId` from `options.request`
  when no explicit one is given.
- Inlined custom-URL session stickiness via a new private
  `getSessionIndex(sessionId)` (replacing the removed
  `_handleCustomUrl`), keyed on `usedProxyUrls` like the base class.
- Re-declared `sessionId?: string` on the SDK's `ProxyInfo` interface
  so users can still read `proxyInfo.sessionId` (v3 carried it on the
  base type).
- Re-imported `ProxyInfo` from `@crawlee/types` (no longer re-exported
  from `@crawlee/core`).
- Tightened a `proxyUrls.some(url => url.includes(...))` access for
  the new `(string | null)[]` array shape.

Stacked on #583 (config redesign); rebases onto v4 once that lands.
- Custom URL rotation: post-increment the round-robin index so the
  first sessionless call returns proxyUrls[0] (was off-by-one).
- Surface `username` on the returned ProxyInfo by parsing it out of
  the resolved URL — v3 carried it via `super.newProxyInfo`.
- parseSessionIdOrOptions now rejects non-plain objects (e.g. Date,
  Array) so `newUrl(new Date())` throws as users expect.

test: `newUrl({})` is no longer 'invalid' — empty TieredProxyOptions
is a legal v4 call shape; documented the carve-out.
…oxyInfo shape

- newUrl/newProxyInfo accept an optional second `legacyOptions`
  argument so existing callers that pass `(sessionId, {request})`
  keep working under the v4 shape too.
- Returned ProxyInfo omits Apify-only fields (groups, countryCode)
  when not using Apify Proxy and only includes `proxyTier` when
  defined — matches v3's strict-deep-equal expectations.
…nfiguration tests

- ProxyInfo.username is now the decoded form (`user@name` rather
  than `user%40name`), matching v3 behaviour and the test
  expectations.
- Added a beforeEach to the `Actor.createProxyConfiguration()`
  describe that resets serviceLocator + Configuration.globalConfig +
  Actor._instance so each test sees the env vars it sets.
crawlee v4 (apify/crawlee#3599, beta.51) removed `tieredProxyUrls`,
`tieredProxyConfig`, `_handleTieredUrl`, and `proxyTier` from
`ProxyConfiguration` / `ProxyInfo`. The SDK's wrapper used to thread
those through to the base class; with the upstream API gone, that
plumbing has to go too.

- Remove the `tieredProxyConfig` field from the SDK's
  `ProxyConfigurationOptions`.
- Drop the constructor branch that forwarded `tieredProxyUrls` /
  `tieredProxyConfig` to the base class and the now-unreachable
  `_generateTieredProxyUrls` helper.
- Drop the `tieredProxyUrls` short-circuit and `proxyTier` field
  from `newUrl` / `newProxyInfo`.
- Drop the corresponding test groups in `proxy_configuration.test.ts`.
@B4nan B4nan force-pushed the fix/proxy-configuration-v4-adapt branch from b490925 to 4f718b5 Compare April 30, 2026 19:15
@B4nan B4nan changed the base branch from v4 to fix/storage-client-v4-adapt May 6, 2026 09:26
@barjin barjin self-requested a review May 7, 2026 06:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants