Skip to content

Avner/async forward pass#4

Open
avnermay wants to merge 7 commits intoavner/mainfrom
avner/async-forward-pass
Open

Avner/async forward pass#4
avnermay wants to merge 7 commits intoavner/mainfrom
avner/async-forward-pass

Conversation

@avnermay
Copy link
Copy Markdown
Collaborator

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

avnermay and others added 7 commits February 12, 2026 07:18
Introduce a new speculative decoding algorithm (ASYNC_SPEC) where the draft
model runs on a dedicated GPU in a separate process, communicating via
multiprocessing Pipe. The key innovation is a tree cache mechanism on the
draft GPU that pre-populates speculative continuations in the background.

The verify step reuses SGLang's existing EagleVerifyInput infrastructure
(topk=1 chain verification), so acceptance/rejection logic and KV cache
management are identical to standard speculative decoding.

New files:
- async_spec/handshake.py: IPC protocol (SpecRequest/SpecResponse)
- async_spec/verify.py: Verification algorithm (greedy + stochastic)
- async_spec/tree_utils.py: Tree construction helpers
- async_spec/async_draft_runner.py: Dedicated-GPU draft process with tree cache
- async_spec/async_spec_worker.py: Scheduler-side orchestrator

Modified files:
- spec_info.py: ASYNC_SPEC enum + is_async_spec() + create_worker()
- server_args.py: 6 new CLI args + validation
- schedule_batch.py: recovery_token_id, last_spec_step_accepted_len on Req
- scheduler.py: _init_async_spec_worker() to spawn draft process
- cuda_graph_runner.py: Capture TARGET_VERIFY graphs for ASYNC_SPEC

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@avnermay avnermay changed the base branch from avner/ssd to avner/main February 13, 2026 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant