SSD V1 by avnermay · Pull Request #5 · togethercomputer/sglang-private

avnermay · 2026-02-17T23:50:31Z

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

arnica-github-connector · 2026-03-13T16:35:33Z

+        self.speculative_algorithm = SpeculativeAlgorithm.from_string(
+            server_args.speculative_algorithm
+        )


Static Code Analysis Risk: Together python jinja2 ssti

User-controlled input is used as a Jinja2 template string (Server-Side Template Injection). Jinja2 templates can execute arbitrary Python code via class/mro traversal (CWE-94). Load templates from trusted static sources only; pass user data as render() variables, never as the template itself.

Severity: High 🚨
Status: Open 🔴

References:

https://cwe.mitre.org/data/definitions/94

https://portswigger.net/web-security/server-side-template-injection

https://jinja.palletsprojects.com/en/3.1.x/api/#jinja2.Template

Suggested reviewers 🧐: @avnermay

More details:

🌻 View in Arnica

If you see an issue, please contact Shasheen in the #security-engineering Slack channel.

Take action by replying with an [arnica] command 💬

Actions

Use [arnica] or [a] to interact with the Arnica bot to acknowledge or dismiss code risks.

To acknowledge the finding as a valid code risk: [arnica] ack <acknowledge additional details>

To dismiss the risk with a reason: [arnica] dismiss <fp|accept|capacity> <dismissal reason>

Examples

[arnica] ack This is a valid risk and I'm looking into it

[arnica] dismiss fp Dismissed - Risk Not Accurate: (i.e. False Positive)

[arnica] dismiss accept Dismiss - Risk Accepted: Allow the risk to exist in the system

[arnica] dismiss capacity Dismiss - No Capacity: This will need to wait for a future sprint

arnica-github-connector · 2026-03-13T16:35:36Z

            extra_max_context_len = 4
            if self.server_args.speculative_num_draft_tokens is not None:
-                extra_max_context_len += self.server_args.speculative_num_draft_tokens
+                if SpeculativeAlgorithm.from_string(self.server_args.speculative_algorithm).is_async():


Static Code Analysis Risk: Together python jinja2 ssti

User-controlled input is used as a Jinja2 template string (Server-Side Template Injection). Jinja2 templates can execute arbitrary Python code via class/mro traversal (CWE-94). Load templates from trusted static sources only; pass user data as render() variables, never as the template itself.

Severity: High 🚨
Status: Open 🔴

References:

https://cwe.mitre.org/data/definitions/94

https://portswigger.net/web-security/server-side-template-injection

https://jinja.palletsprojects.com/en/3.1.x/api/#jinja2.Template

Suggested reviewers 🧐: @avnermay

More details:

🌻 View in Arnica

If you see an issue, please contact Shasheen in the #security-engineering Slack channel.

Take action by replying with an [arnica] command 💬

Actions

Use [arnica] or [a] to interact with the Arnica bot to acknowledge or dismiss code risks.

To acknowledge the finding as a valid code risk: [arnica] ack <acknowledge additional details>

To dismiss the risk with a reason: [arnica] dismiss <fp|accept|capacity> <dismissal reason>

Examples

[arnica] ack This is a valid risk and I'm looking into it

[arnica] dismiss fp Dismissed - Risk Not Accurate: (i.e. False Positive)

[arnica] dismiss accept Dismiss - Risk Accepted: Allow the risk to exist in the system

[arnica] dismiss capacity Dismiss - No Capacity: This will need to wait for a future sprint

arnica-github-connector · 2026-03-13T16:35:39Z

                )

    def _handle_speculative_decoding(self):
+        speculative_algorithm = SpeculativeAlgorithm.from_string(self.speculative_algorithm)


Static Code Analysis Risk: Together python jinja2 ssti

User-controlled input is used as a Jinja2 template string (Server-Side Template Injection). Jinja2 templates can execute arbitrary Python code via class/mro traversal (CWE-94). Load templates from trusted static sources only; pass user data as render() variables, never as the template itself.

Severity: High 🚨
Status: Open 🔴

References:

https://cwe.mitre.org/data/definitions/94

https://portswigger.net/web-security/server-side-template-injection

https://jinja.palletsprojects.com/en/3.1.x/api/#jinja2.Template

Suggested reviewers 🧐: @avnermay

More details:

🌻 View in Arnica

If you see an issue, please contact Shasheen in the #security-engineering Slack channel.

Take action by replying with an [arnica] command 💬

Actions

Use [arnica] or [a] to interact with the Arnica bot to acknowledge or dismiss code risks.

To acknowledge the finding as a valid code risk: [arnica] ack <acknowledge additional details>

To dismiss the risk with a reason: [arnica] dismiss <fp|accept|capacity> <dismissal reason>

Examples

[arnica] ack This is a valid risk and I'm looking into it

[arnica] dismiss fp Dismissed - Risk Not Accurate: (i.e. False Positive)

[arnica] dismiss accept Dismiss - Risk Accepted: Allow the risk to exist in the system

[arnica] dismiss capacity Dismiss - No Capacity: This will need to wait for a future sprint

arnica-github-connector · 2026-03-18T18:50:09Z

CVE-2025-32434

Static Code Analysis Risk: Together python torch load

torch.load() detected (CVE-2025-32434, CVSS 9.8). In PyTorch <= 2.5.1, torch.load() enables arbitrary code execution even with weights_only=True. The weights_only flag does NOT provide the intended protection on affected versions. Use safetensors format for model weights, or ensure PyTorch >= 2.6.0 and validate model provenance before loading.

Severity: High 🚨
Status: Open 🔴

References:

https://nvd.nist.gov/vuln/detail/CVE-2025-32434

GHSA-53q9-r3pm-6pq6

https://cwe.mitre.org/data/definitions/502

_{NOTE: This comment applies to line 251 but could not be created inline due to GitHub limitations.}

Suggested reviewers 🧐: @avnermay

More details:

🌻 View in Arnica

If you see an issue, please contact Shasheen in the #security-engineering Slack channel.

Take action by replying with an [arnica] command 💬

Actions

Use [arnica] or [a] to interact with the Arnica bot to acknowledge or dismiss code risks.

To acknowledge the finding as a valid code risk: [arnica] ack <acknowledge additional details>

To dismiss the risk with a reason: [arnica] dismiss <fp|accept|capacity> <dismissal reason>

Examples

[arnica] ack This is a valid risk and I'm looking into it

[arnica] dismiss fp Dismissed - Risk Not Accurate: (i.e. False Positive)

[arnica] dismiss accept Dismiss - Risk Accepted: Allow the risk to exist in the system

[arnica] dismiss capacity Dismiss - No Capacity: This will need to wait for a future sprint

… SpeculationResponse

…though)

…_draft.py

avnermay added 3 commits February 14, 2026 08:37

skeleton

af8a2c0

More skeleton code

a162d5a

Partial progress integrating SSD

57ad351

github-actions bot added the documentation Improvements or additions to documentation label Feb 23, 2026

avnermay added 5 commits February 23, 2026 12:27

Checkpoint before refactor by Claude code with companion process

ea073ca

Lots of changes to get code running

cb8deb9

NCCL logging and timestamp logging

4162048

some refactor

841e998

Update KV allocation kernel to allocate enough slots for full draft tree

e89fb3b

github-actions bot added the dependencies label Mar 13, 2026

arnica-github-connector bot reviewed Mar 13, 2026

View reviewed changes

avnermay added 12 commits March 13, 2026 09:40

Compare NCCL outputs of SSD vs SGLang

7e82536

NCCL comparison

95891ba

Bug fixes

c4af296

+1 to num tokens for draft tree

fe3aff1

Pass through speculative_async_fan_out

1ac973e

Fix accept_length to match SSD's expectation (not including bonus token)

e9b5714

Download draft model locally during scheduler launch if it's a HF repo

88086c8

Put all logging statements inside if statements

2b7c1bf

Tensor-parallel target bug fix and profiling code

0caf857

Cross-node support

81c457c

Beginning of Eagle support

efa9d79

Merge branch 'avner/merge-upstream-2026-03-18' into avner/ssd-v1

eaa5029

avnermay requested a review from zhyncs as a code owner March 18, 2026 18:37

github-actions bot added model-gateway Multi-modal diffusion quant labels Mar 18, 2026

github-actions bot added mthreads sgl-kernel lora speculative-decoding blackwell hicache deterministic jit-kernel labels Mar 18, 2026

arnica-github-connector bot reviewed Mar 18, 2026

View reviewed changes

avnermay added 21 commits March 18, 2026 11:52

Merge branch 'main' into avner/ssd-v1

f24e877

Fixes to get AsyncSpecWorker functioning after the big upstream merge

f51aa90

Integrate latest SSD refactors of PrefillRequest, SpeculationRequest,…

aabba8c

… SpeculationResponse

Bug fixes to get eagle working

689a741

Async Phoenix support

b78c5db

Support for Async Eagle-V1 and Async Phoenix-V2 (SSD side needs work …

403a484

…though)

Fix Eagle activation bug

60c9cbd

Finish fixing Eagle activation bug

aee5b66

page size > 1 support for SSD

4d1e585

Depend on avner/sglang-fa4-phnx branch of SSD repo

f72a70e

Merge branch 'main' into avner/ssd-v1

ee81e6d

Bug fix

c3336ec

Add file for launching remote draft process

6257fcf

Fix launch command in scripts/launch_remote_draft.py

ea239ee

Add phoenix support to scripts/launch_remote_draft.py

8f7df4b

Add log statement before loading draft model in scripts/launch_remote…

92b2293

…_draft.py

Draft vs target context length mismatch fix

c2736ca

page size argument and more logging

21bb5db

Fix is_async bug

0706d98

Switch from ssh to https git dependency in pyproject.toml

5cb687c

More visible draft <-> target waiting logs

1ee894c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SSD V1#5

SSD V1#5
avnermay wants to merge 41 commits intoavner/mainfrom
avner/ssd-v1

avnermay commented Feb 17, 2026

Uh oh!

arnica-github-connector bot Mar 13, 2026

Uh oh!

arnica-github-connector bot Mar 13, 2026

Uh oh!

arnica-github-connector bot Mar 13, 2026

Uh oh!

arnica-github-connector bot Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

avnermay commented Feb 17, 2026

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

arnica-github-connector bot Mar 13, 2026

Choose a reason for hiding this comment

Static Code Analysis Risk: Together python jinja2 ssti

References:

More details:

Actions

Examples

Uh oh!

arnica-github-connector bot Mar 13, 2026

Choose a reason for hiding this comment

Static Code Analysis Risk: Together python jinja2 ssti

References:

More details:

Actions

Examples

Uh oh!

arnica-github-connector bot Mar 13, 2026

Choose a reason for hiding this comment

Static Code Analysis Risk: Together python jinja2 ssti

References:

More details:

Actions

Examples

Uh oh!

arnica-github-connector bot Mar 18, 2026

Choose a reason for hiding this comment

Static Code Analysis Risk: Together python torch load

References:

More details:

Actions

Examples

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant