fix inference bugs, add cache-aware e2e quality gates by DingmaomaoBJTU · Pull Request #381 · microsoft/winml-cli

DingmaomaoBJTU · 2026-04-23T02:34:59Z

Summary

Fix NER numpy.float32 serialization error in serve path — _sanitize_numpy() converts numpy scalars to Python types before pydantic/JSON serialization
Fix cache-key-prefixed build directory loading — _find_build_artifacts() supports both plain (model.onnx) and cache-prefixed ({key}_model.onnx) layouts, enabling engine.load() to load directly from ~/.cache/winml/artifacts/
Fix task-mismatch cache loading — _find_build_artifacts(task=) filters by manifest task, avoiding wrong ONNX model (e.g. feature-extraction head for text-classification); falls back to HF rebuild when no match found
Redirect stdout→stderr during engine.load() in CLI to prevent build-pipeline prints from contaminating --format json output
Merge test_run_e2e.py + test_run_hub_models_e2e.py into 3-tier quality gate (Feature 14 + Schema 35 + Inference 35 = 84 tests)
Add test_serve_e2e.py 4-tier quality gate (103 tests)
Cache-aware e2e: tests use ~/.cache/winml/artifacts/ when available, per-file timeout(3600) for e2e
14 new unit tests for _sanitize_numpy, _find_build_artifacts, NER output normalization

All 187 e2e tests passing (run 84 + serve 103).

🤖 Generated with Claude Code

…t tests Bug fixes: - Fix NER numpy.float32 serialization: add _sanitize_numpy() to convert numpy scalars to Python types before pydantic/JSON serialization - Fix cache-key-prefixed build directory loading: _find_build_artifacts() supports both plain (model.onnx) and prefixed ({key}_model.onnx) layouts - Fix task-mismatch cache loading: _find_build_artifacts(task=) filters by manifest task to avoid using wrong ONNX (e.g. feature-extraction model for text-classification request); fallback to HF rebuild - Redirect stdout to stderr during engine.load() in CLI to prevent build-pipeline prints from contaminating --format json output E2E quality gates (187 tests, all passing): - Merge test_run_e2e.py + test_run_hub_models_e2e.py into 3-tier quality gate: Feature (14) + Schema (35) + Inference (35) - Add test_serve_e2e.py: 4-tier quality gate with 103 tests - Cache-aware: tests use ~/.cache/winml/artifacts/ when available, falling back to HF model ID for uncached models - Fix TestClient fixture: use context manager to trigger FastAPI lifespan - Move shared test_image/runner fixtures to conftest.py - Per-file timeout(3600) for e2e tests, global 300s unchanged Unit tests: - TestSanitizeNumpy: 6 tests for numpy scalar conversion - TestFindBuildArtifacts: 7 tests including task filtering - TestNormalizeNEROutput: verify NER output is JSON-serializable

timenick

Inline comments on 7 issues flagged against CLAUDE.md / tests/CLAUDE.md. Core fixes (_sanitize_numpy, _find_build_artifacts task filter, HF rebuild fallback) look correct — findings are mostly test-layout duplication plus a couple of behavior nits.

🤖 Generated with Claude Code

- Move shared e2e helpers (_HUB_DATA, _unique_pairs, _PAIRS, _pytest_id, _find_cache_dir, _resolve_model_arg, _SAMPLE_TEXT, _TEXT_BY_FIELD) to conftest.py as single source of truth (comments 1+2) - Remove duplicate test_image fixture from test_serve_e2e.py (comment 1) - Replace pytest.skip with pytest.xfail for missing-input cases so coverage gaps are visible, not hidden (comment 3) - Fix redirect_stdout comment to match unconditional behavior (comment 4) - Document why _extract_json is needed despite redirect_stdout — some C-extension code writes directly to fd 1, bypassing sys.stdout (comment 5) - Raise FileNotFoundError when task=None and multiple different-task variants exist in cache dir, preventing silent wrong-model load (comment 6) - Broaden except to (FileNotFoundError, JSONDecodeError, KeyError) so corrupt manifests trigger HF rebuild fallback (comment 7)

timenick

Re-review of the feedback commit. All 7 previous comments were addressed substantively (helpers moved to conftest, duplicate fixture removed, xfail with actionable reasons, docstrings, _find_build_artifacts task-ambiguity guard, broadened except). Unit tests pass locally. One new blocker was introduced by moving the helper into conftest.py — see inline.

🤖 Generated with Claude Code

pytest_id in conftest.py triggers pluggy PluginValidationError because pytest treats any conftest function starting with pytest_ as a hook.

timenick

LGTM

## Summary - Fix NER `numpy.float32` serialization error in serve path — `_sanitize_numpy()` converts numpy scalars to Python types before pydantic/JSON serialization - Fix cache-key-prefixed build directory loading — `_find_build_artifacts()` supports both plain (`model.onnx`) and cache-prefixed (`{key}_model.onnx`) layouts, enabling `engine.load()` to load directly from `~/.cache/winml/artifacts/` - Fix task-mismatch cache loading — `_find_build_artifacts(task=)` filters by manifest task, avoiding wrong ONNX model (e.g. feature-extraction head for text-classification); falls back to HF rebuild when no match found - Redirect stdout→stderr during `engine.load()` in CLI to prevent build-pipeline prints from contaminating `--format json` output - Merge `test_run_e2e.py` + `test_run_hub_models_e2e.py` into 3-tier quality gate (Feature 14 + Schema 35 + Inference 35 = 84 tests) - Add `test_serve_e2e.py` 4-tier quality gate (103 tests) - Cache-aware e2e: tests use `~/.cache/winml/artifacts/` when available, per-file `timeout(3600)` for e2e - 14 new unit tests for `_sanitize_numpy`, `_find_build_artifacts`, NER output normalization All 187 e2e tests passing (run 84 + serve 103). 🤖 Generated with [Claude Code](https://claude.com/claude-code)

DingmaomaoBJTU requested a review from a team as a code owner April 23, 2026 02:35

timenick reviewed Apr 23, 2026

View reviewed changes

Comment thread tests/e2e/conftest.py Outdated

DingmaomaoBJTU added 2 commits April 23, 2026 15:29

rename pytest_id to hub_test_id to avoid pytest hook collision

f7ff1bc

pytest_id in conftest.py triggers pluggy PluginValidationError because pytest treats any conftest function starting with pytest_ as a hook.

Merge branch 'main' into qiowu/serve_run_e2e

0363c46

timenick approved these changes Apr 23, 2026

View reviewed changes

DingmaomaoBJTU merged commit c0da212 into main Apr 23, 2026
9 checks passed

DingmaomaoBJTU deleted the qiowu/serve_run_e2e branch April 23, 2026 10:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix inference bugs, add cache-aware e2e quality gates#381

fix inference bugs, add cache-aware e2e quality gates#381
DingmaomaoBJTU merged 4 commits into
mainfrom
qiowu/serve_run_e2e

DingmaomaoBJTU commented Apr 23, 2026

Uh oh!

timenick left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

timenick left a comment

Uh oh!

Uh oh!

timenick left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

DingmaomaoBJTU commented Apr 23, 2026

Summary

Uh oh!

timenick left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

timenick left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

timenick left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants