Skip to content

Conversation

@Rohan138
Copy link

@Rohan138 Rohan138 commented Dec 6, 2025

Purpose

Revert/fix #725 to add back triton moe support for default A16W4 gpt-oss weights. Also fix padding for A16W4 CK backend.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@Rohan138 Rohan138 marked this pull request as ready for review December 8, 2025 00:18
@Rohan138
Copy link
Author

Rohan138 commented Dec 8, 2025

Repro script:

export VLLM_USE_AITER_UNIFIED_ATTENTION=1 VLLM_ROCM_USE_AITER=1 VLLM_ROCM_USE_AITER_MHA=0
export VLLM_ROCM_USE_AITER_FUSED_MOE_A16W4=1

MODEL=openai/gpt-oss-120b
# MODEL=amd/gpt-oss120b-w-mxfp4-a-fp8

vllm serve $MODEL -tp 8 --async-scheduling --swap-space 16 --no-enable-prefix-caching --disable-log-requests --disable-uvicorn-access-log --trust-remote-code --block-size 64 --compilation-config '{"cudagraph_mode":"FULL_AND_PIECEWISE"}'

# lm-eval
lm_eval --model local-completions --model_args model=$MODEL,base_url=http://0.0.0.0:8000/v1/completions,num_concurrent=256,max_retries=10,max_gen_toks=2048 --batch_size auto --tasks gsm8k --num_fewshot 5 --limit 200  --output_path . --apply_chat_template 2>&1 | tee -a eval.log

MI300 Triton w4a16

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.960 ± 0.0139
strict-match 5 exact_match 0.665 ± 0.0335

MI350 Triton w4a8

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.945 ± 0.0162
strict-match 5 exact_match 0.665 ± 0.0335

MI350 Triton w4a16

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.96 ± 0.0139
strict-match 5 exact_match 0.65 ± 0.0338

MI350 CK w4a16

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.960 ± 0.0139
strict-match 5 exact_match 0.705 ± 0.0323

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants