fix(engine): support AMD AMF GPU encoding#1082
Conversation
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
Fallow audit reportFound 62 findings. Dead code (1)
Duplication (57, showing 50)
Showing 50 of 57 findings. Run fallow locally or inspect the CI output for the full report. Health (4)
Generated by fallow. |
jrusso1020
left a comment
There was a problem hiding this comment.
Read the full diff + traced buildEncoderArgs / buildStreamingArgs end-to-end. Two-step detection (parse compile-list → probe each candidate) is a strict upgrade over the prior first-match-wins — the exact "AMD Windows machine with NVENC compiled in but no NVIDIA hardware" scenario the PR exists to fix.
Pinned invariants worth keeping
getCompiledGpuEncodersreturns candidates inGPU_ENCODER_CANDIDATESpriority order (nvenc → videotoolbox → vaapi → qsv → amf). The test ingpuEncoder.test.tsasserts the exact returned array; if anyone reorders the candidate array, that test breaks. AMD goes last — correct, since on a mixed Windows host (NVIDIA + AMD discrete + Intel iGPU) the existing nvenc/qsv-first preference is preserved.selectUsableGpuEncodertest pins fallthrough from compiled-but-unusable NVENC → AMF. That's the load-bearing assertion for this PR; keep it as-is, don't refactor into a generic parameterized case.- Intra-file parity between
chunkEncoder.tsandstreamingEncoder.tsis symmetric — AMF case in the rate-control switch, AMF added to the H.264 B-frame guard, parallel tests in both files. The pre-disclosed Fallow duplication red is exactly because both files duplicate this logic; the parity fix did the right thing by touching both at once.
Non-blocking observations
-
AMF rate-control = CQP, not CRF-equivalent. The branch picks
-rc cqp -qp_i N -qp_p Nfor the no-bitrate case. This is consistent with how the other GPU paths handle quality (-cqon nvenc,-qpon vaapi,-global_qualityon qsv) — i.e. fixed-quantizer, not perceptual rate-control. If the Onee/Donato thread today is steering toward CRF-like behavior across the board, AMF won't be CRF-aligned after this PR even though it's now supported. AMF has-rc qvbr(quality-defined VBR, closer to CRF semantics) — would be the migration target for the broader CRF direction. Not for this PR; just noting AMF inherits the same fixed-QP shape as the other GPU paths. -
No preset/speed mapping for AMF. The switch only emits rate-control args — no
-presetor-qualitypush. Compare:nvenc/qsv: explicit-presetmapped viamapPresetForGpuEncodervideotoolbox/vaapi/amf: no preset arg at all
Side effect:
--quality draft(which maps topreset="ultrafast") on an AMD box uses FFmpeg'sh264_amfdefault of-quality quality(the slowest of the three AMF modes). Users who chose--quality draftfor speed will be confused about why their AMF encode is slow. Same gap exists for videotoolbox/vaapi today, so not a regression — but the AMF "balanced" / "speed" / "quality" trio is well-defined and an obvious follow-up for the next AMF iteration. -
Probe is silent on failure. When
canUseGpuEncoder(...)returns false, there's no log of why — stderr is"ignore". A future "I have an AMD GPU and it's falling through to CPU" debug session has no signal. Consider aprocess.env.HYPERFRAMES_DEBUG_GPU_PROBE=1toggle that flips stderr to inherited and logs the chosen encoder + each probe's exit code. Cheap; saves a long debug loop the first time a real AMD user hits a probe-but-fails path. -
Probe exercises 1 frame at 16x16. Says "encoder can encode at all." Doesn't validate B-frame support, color-space tagging, HDR transfer, or real-frame parameters. Real failures from those will surface at first chunk-encode time — acceptable for a baseline detection check, just noting the fidelity ceiling.
-
No real AMF hardware run — author pre-disclosed in PR body. First AMD Windows user is the canary; structural correctness covered by unit tests, real-render covered by the locally-available videotoolbox path. Fine for the kind of change this is, but worth a Slack call-out if/when an AMD-Windows internal test machine becomes available.
-
Fallow audit red — pre-disclosed in PR body as inherited
chunkEncoder.ts↔streamingEncoder.tsduplication, not introduced. ✓ Acknowledged appropriately; not blocking.
Verdict
Structurally sound. The "compile-list + probe" detection is a real upgrade. AMF arg generation matches the FFmpeg h264_amf / hevc_amf contract. Test coverage pins the AMD-Windows fallthrough scenario.
CI is mid-flight (regression shards + Windows tests in progress) — assuming those land green, James/Miguel to merge.
— Rames Jusso (hyperframes)
vanceingalls
left a comment
There was a problem hiding this comment.
Strengths
getCompiledGpuEncoders+canUseGpuEncoderprobe (gpuEncoder.ts:29-149) is a meaningful correctness improvement beyond just adding AMF — it kills the "compiled means usable" false-positive class for all backends, including NVENC on machines where the driver is absent. That was the actual bug behind #1079.- B-frame guard applied symmetrically to AMF in both
chunkEncoder.ts:160andstreamingEncoder.ts:243. No asymmetry between the two encoder paths. - AMF included in the
mapPresetForGpuEncoderpass-through parameterized test — author thought about the preset surface, not just the codec name.
Important
1. void selectUsableGpuEncoder(...).then(resolve) — no rejection handler (gpuEncoder.ts:59)
If selectUsableGpuEncoder rejects (it can't today, but the contract isn't enforced), detectGpuEncoder's outer Promise hangs forever instead of resolving to null. getCachedGpuEncoder (and every render call behind it) would stall indefinitely. One line fix:
void selectUsableGpuEncoder(candidates, canUseGpuEncoder)
.then(resolve)
.catch(() => resolve(null));2. SIGTERM-only on probe timeout — no SIGKILL escalation (gpuEncoder.ts:121)
ffmpeg.kill("SIGTERM") works for clean exits; a hung or unresponsive driver (exactly the failure mode this PR guards against) may ignore SIGTERM. finish(false) is called immediately so the JS side continues correctly, but the child process stays alive. On a render workload that spawns multiple probes, zombie ffmpeg processes accumulate. Recommend SIGKILL after ~1 s:
const kill = setTimeout(() => ffmpeg.kill("SIGKILL"), 1000);
// in finish():
clearTimeout(kill);3. First-call latency is now serial × 5 s per candidate — up to ~25 s worst case (gpuEncoder.ts:42-60)
Before this PR: detection was one ffmpeg -encoders call, instant. After: one ffmpeg -encoders call plus up to 5 serial probe processes (5 s timeout each). On a Windows AMD machine with NVENC compiled in but unavailable (common), NVENC probe times out before AMF is tried — 5 s added to the first --gpu render. Worst case (5 candidates, only the last usable): ~25 s.
getCachedGpuEncoder memoizes, so this is one-time, not per-chunk. That contains the blast radius, but it's still a UX regression for first render on Windows. Consider parallelizing probes with Promise.all across candidates and picking the first winner by priority order, or at least capping the per-probe timeout lower (2 s is plenty — a usable driver responds fast).
4. Hardcoded VAAPI device /dev/dri/renderD128 in probe (gpuEncoder.ts:102)
Pre-PR behavior: selected VAAPI immediately based on compiled list, then failed at render if the device was wrong — also wrong. Post-PR behavior: better on the common case, but on multi-GPU Linux where the usable VAAPI device is renderD129, the probe reports VAAPI unusable and falls through to the next candidate. Silently breaks VAAPI for those setups.
This is pre-existing behavior + a net improvement, not a regression introduced by this PR. Fine to track as a follow-up issue rather than blocking.
Nit
GPU_ENCODER_CANDIDATESpriority order (gpuEncoder.ts:19-25): on a Windows hybrid Intel iGPU + AMD discrete, QSV probes successfully first and AMF is never tried, even though AMD discrete is usually faster. The PR solves the reported bug. Reordering for perf is a follow-up consideration.
Note on Fallow
The fallow/code-duplication flood is pre-existing duplication across chunkEncoder.ts and streamingEncoder.ts that this PR mirrors for AMF, not new structural debt it introduces. Author calls this out in the PR body and explicitly scoped out the broad refactor. Acknowledged.
fallow/high-crap-score on buildEncoderArgs (CRAP 68, cyclomatic 57) and buildStreamingArgs (CRAP 58.4): these are pre-existing and grow with every encoder added. Not a blocker for this PR, but the complexity ceiling for these functions is real — worth a dedicated refactor before the next encoder lands.
Audited: packages/engine/src/utils/gpuEncoder.ts (end-to-end), packages/engine/src/services/chunkEncoder.ts (new AMF paths), packages/engine/src/services/streamingEncoder.ts (new AMF paths), all test files touched.
Verdict: APPROVE
Reasoning: The detection refactor is a correctness fix that benefits all GPU backends, not just AMF. No correctness blockers. The important findings (rejection-unhandled promise, SIGTERM-only kill, serial probe latency) are real concerns worth fixing but don't break the primary case — a Windows AMD box with only AMF usable — which is what the PR ships.
— Vai
157ce9e to
a53aad5
Compare
Problem
hyperframes render --gpuonly detected and mapped NVENC, VideoToolbox, VAAPI, and QSV. On Windows machines with AMD GPUs, FFmpeg exposesh264_amf/hevc_amf, but HyperFrames had no AMF encoder type or argument path.Closes #1079
What this fixes
h264_amfandhevc_amfencoder names.Root cause
GPU detection used FFmpeg's compiled encoder list as the selection truth and stopped at the first known encoder. Some FFmpeg Windows builds can list encoders that are compiled in but unusable on the current hardware, so a machine with AMD AMF support could still be routed toward an unusable NVENC path. The encoder name mapper also fell back to CPU encoders for unknown backends, so AMF could never be selected intentionally.
This changes detection into a two-step flow: parse compiled hardware encoder candidates, then probe each candidate with a tiny FFmpeg encode and choose the first usable backend.
Verification
Local checks
bun run build:hyperframes-runtimebun run --cwd packages/engine test -- src/utils/gpuEncoder.test.ts src/services/chunkEncoder.test.ts src/services/streamingEncoder.test.tsbun run --cwd packages/engine typecheckbunx oxlint packages/engine/src/utils/gpuEncoder.ts packages/engine/src/utils/gpuEncoder.test.ts packages/engine/src/services/chunkEncoder.ts packages/engine/src/services/chunkEncoder.test.ts packages/engine/src/services/streamingEncoder.ts packages/engine/src/services/streamingEncoder.test.tsbunx oxfmt --check packages/engine/src/utils/gpuEncoder.ts packages/engine/src/utils/gpuEncoder.test.ts packages/engine/src/services/chunkEncoder.ts packages/engine/src/services/chunkEncoder.test.ts packages/engine/src/services/streamingEncoder.ts packages/engine/src/services/streamingEncoder.test.ts packages/cli/src/docs/rendering.md docs/packages/engine.mdx docs/guides/rendering.mdx docs/packages/cli.mdx docs/guides/hdr.mdx docs/packages/producer.mdxgit diff --checkffprobeon the rendered MP4 confirmed H.264, 1920x1080, 10fps, 15s.ffmpeg -v error -i <rendered mp4> -f null -decoded the render without errors.Browser verification
registry/examples/kinetic-typewith local GPU encoding:bun run --cwd packages/cli dev -- render /Users/miguel07code/.codex/worktrees/0da0/hyperframes-oss/registry/examples/kinetic-type --gpu --quality draft --workers 1 --fps 10 --output /tmp/hyperframes-issue-1079-artifacts/issue-1079-amf-gpu-render.mp4detectGpuEncoder()returnedvideotoolboxon this Mac, so the end-to-end render exercised the shared GPU encode path locally.agent-browseropened the rendered MP4 and verified browser playback advanced with:readyState=4currentTime=1.460343duration=15.020996videoWidth=1920videoHeight=1080paused=false/tmp/hyperframes-issue-1079-artifacts/issue-1079-browser-playback.png/tmp/hyperframes-issue-1079-artifacts/issue-1079-browser-playback.webmNotes
I could not exercise real AMF hardware on this Mac. AMF-specific behavior is covered by unit tests for detection fallthrough, encoder naming, and FFmpeg argument generation; the real render/browser proof used the locally available VideoToolbox backend.
The pre-commit hook's
fallowstep still flags inherited duplication acrosschunkEncoder.tsandstreamingEncoder.tswhen both files are touched, plus an existing unused type re-export. I did not include a broad encoder refactor in this issue fix.