fix(engine): honor producer BeginFrame disable env#1711
Closed
miguel-heygen wants to merge 1 commit into
Closed
Conversation
Collaborator
Author
|
Closing this as the wrong root-fix direction. The investigation evidence showed the prod compatibility env is stale/dead, but James clarified BeginFrame is the preferred Linux path and this env flag is intentionally not the approach we want to use. Continuing investigation at the BeginFrame/browser lifecycle layer instead. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the producer/browser-crash root cause by making the engine honor the existing production env knob:
PRODUCER_ENABLE_BEGIN_FRAME=false|off|0now resolves toforceScreenshot=truePRODUCER_FORCE_SCREENSHOT=trueremains the direct screenshot-mode knobresolveConfig({ forceScreenshot })overrides still win over env defaultsProduction evidence
The failing Temporal activities surfaced as browser/session failures:
Navigation timeout of 60000 ms exceededNavigating frame was detachedProtocol error (Runtime.evaluate): Target closedRoot finding from prod:
temporal-hyperframes-producer-worker-sidecar-configmapalready setsPRODUCER_ENABLE_BEGIN_FRAME=falsePRODUCER_ENABLE_BEGIN_FRAME=falsein PID 1 envPRODUCER_FORCE_SCREENSHOT; it did not readPRODUCER_ENABLE_BEGIN_FRAMEcaptureMode:"beginframe"andforceScreenshot:falsechrome-headlesschildrenSo prod was configured to avoid BeginFrame, but the engine ignored that compatibility env and kept taking the BeginFrame path.
Verification
bun test packages/engine/src/config.test.ts packages/engine/src/services/frameCapture-transientErrors.test.ts packages/producer/src/services/render/stages/probeStage.test.ts— 58 pass, 0 failbunx oxfmt --check packages/engine/src/config.ts packages/engine/src/config.test.tsbunx oxlint packages/engine/src/config.ts packages/engine/src/config.test.tsgit diff --checkPRODUCER_ENABLE_BEGIN_FRAME=falsenow returnsresolveConfig().forceScreenshot === truePRODUCER_ENABLE_BEGIN_FRAME=truekeepsforceScreenshot === falseI also pulled the exact failed production artifacts and rendered all three locally through the built CLI with
PRODUCER_ENABLE_BEGIN_FRAME=false; all selected screenshot mode and completed:nav-timeout-23c1— rendered successfullyframe-detached-1a80— rendered successfullytarget-closed-40c8— rendered successfullyDeploy / rerun plan
After a producer sidecar image with this patch is deployed, reset or rerun the failed Temporal workflows to verify they no longer hit the browser crash path. I did not reset them before this deploy because current prod would still run the old sidecar image and old config behavior.