Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
ef16a22
feat(qwen3): add genai bundle generation and inference script
github-actions[bot] Jun 29, 2026
6d67e24
refactor(qwen3/genai): generic build_genai_config with ONNX introspec…
github-actions[bot] Jun 29, 2026
2f8f884
feat(session): add GenaiSession for onnxruntime-genai inference
github-actions[bot] Jun 29, 2026
0cf6d48
feat(session): add GenaiSession.apply_chatml_template static method
github-actions[bot] Jun 29, 2026
f3a64bc
feat(qwen3/genai): NPU+CPU hybrid EP support in genai_config
github-actions[bot] Jun 30, 2026
6e51029
fix(genai_session): let genai_config.json drive EP routing, add mixed EP
github-actions[bot] Jun 30, 2026
8cfa8dc
feat: add --compile flag to infer_genai.py for EPContext pre-compilation
github-actions[bot] Jun 30, 2026
c992681
fix: resolve prefill compilation hang by forcing htp_graph_finalizati…
github-actions[bot] Jun 30, 2026
5b0a0e2
fix: move _do_compile to module-level _qnn_compile_worker for Windows…
github-actions[bot] Jun 30, 2026
1c962e5
perf: use configured htp_graph_finalization_optimization_mode for gen…
github-actions[bot] Jun 30, 2026
837cd84
simplify: remove htp_graph_finalization_optimization_mode override in…
github-actions[bot] Jun 30, 2026
74ca8cf
refactor(genai): move generic bundle logic to utils/genai, qwen3/gena…
github-actions[bot] Jun 30, 2026
aa5e7d1
fix: address code review issues in genai bundle and session
github-actions[bot] Jul 1, 2026
18a8f03
fix: import rule violations, fallback-stage path bug, and div-by-zero…
github-actions[bot] Jul 1, 2026
f729f92
feat(genai): auto-build embeddings+lm_head in export script using new…
github-actions[bot] Jul 1, 2026
ebef5cf
fix(genai): patch embeddings+lm_head seq_len to dynamic; revert cpu-o…
github-actions[bot] Jul 1, 2026
4babc22
fix(genai): mirror non-QNN ONNX files into compiled bundle
github-actions[bot] Jul 1, 2026
7428fa9
refactor(scripts): unify Qwen3 export + inference into qwen3.py
github-actions[bot] Jul 1, 2026
b2d8270
refactor(genai): make bundle machinery EP-agnostic; move QNN into qwen3
github-actions[bot] Jul 3, 2026
0b589f3
fix(quant): use uint8 activations for transformer-only w8a8 (matches …
github-actions[bot] Jul 1, 2026
776f328
revert(quant): restore w8a16 (uint16 activations) for transformer-only
github-actions[bot] Jul 1, 2026
854710f
Strip exporter-injected default GQA attrs from transformer bundle ONNX
github-actions[bot] Jul 1, 2026
c7c84b9
Truncate cos/sin rope cache to max_cache_len at export time
github-actions[bot] Jul 1, 2026
0fc155a
fix: pin rope cache to static Python int to fix symbolic tracing bug
github-actions[bot] Jul 1, 2026
bbff1f2
fix(scripts): drop broken genai session import and dead _SUPPORTED_EPS
github-actions[bot] Jul 3, 2026
0581106
Merge origin/main into qwen3 genai bundle branch
github-actions[bot] Jul 3, 2026
22d38ba
fix(qwen3): forward transformer_onnx_passes through genai bundle wrapper
github-actions[bot] Jul 3, 2026
a8075a9
fix(genai): preserve pad_token_id==0 and clear CodeQL import/statemen…
github-actions[bot] Jul 3, 2026
0c22e2a
docs(qwen3): note deferred max_rope_len wiring at the transformer-onl…
github-actions[bot] Jul 3, 2026
272126e
fix(qwen3): route export CLI via set_defaults(func) dispatch
github-actions[bot] Jul 3, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -241,6 +241,10 @@ nul
export_config.json
*_perf.json
shape_config.json
/out/

# ONNX external data files at repo root (e.g. EPContext .data blobs)
/*.data

# UV / pip
uv.lock
Expand Down
185 changes: 0 additions & 185 deletions scripts/export_qwen3_embeddings_lm_head.py

This file was deleted.

171 changes: 0 additions & 171 deletions scripts/export_qwen3_transformer_only.py

This file was deleted.

Loading
Loading