Skip to content

fix inference bugs, add cache-aware e2e quality gates#381

Merged
DingmaomaoBJTU merged 4 commits into
mainfrom
qiowu/serve_run_e2e
Apr 23, 2026
Merged

fix inference bugs, add cache-aware e2e quality gates#381
DingmaomaoBJTU merged 4 commits into
mainfrom
qiowu/serve_run_e2e

Conversation

@DingmaomaoBJTU

Copy link
Copy Markdown
Collaborator

Summary

  • Fix NER numpy.float32 serialization error in serve path — _sanitize_numpy() converts numpy scalars to Python types before pydantic/JSON serialization
  • Fix cache-key-prefixed build directory loading — _find_build_artifacts() supports both plain (model.onnx) and cache-prefixed ({key}_model.onnx) layouts, enabling engine.load() to load directly from ~/.cache/winml/artifacts/
  • Fix task-mismatch cache loading — _find_build_artifacts(task=) filters by manifest task, avoiding wrong ONNX model (e.g. feature-extraction head for text-classification); falls back to HF rebuild when no match found
  • Redirect stdout→stderr during engine.load() in CLI to prevent build-pipeline prints from contaminating --format json output
  • Merge test_run_e2e.py + test_run_hub_models_e2e.py into 3-tier quality gate (Feature 14 + Schema 35 + Inference 35 = 84 tests)
  • Add test_serve_e2e.py 4-tier quality gate (103 tests)
  • Cache-aware e2e: tests use ~/.cache/winml/artifacts/ when available, per-file timeout(3600) for e2e
  • 14 new unit tests for _sanitize_numpy, _find_build_artifacts, NER output normalization

All 187 e2e tests passing (run 84 + serve 103).

🤖 Generated with Claude Code

…t tests

Bug fixes:
- Fix NER numpy.float32 serialization: add _sanitize_numpy() to convert
  numpy scalars to Python types before pydantic/JSON serialization
- Fix cache-key-prefixed build directory loading: _find_build_artifacts()
  supports both plain (model.onnx) and prefixed ({key}_model.onnx) layouts
- Fix task-mismatch cache loading: _find_build_artifacts(task=) filters
  by manifest task to avoid using wrong ONNX (e.g. feature-extraction
  model for text-classification request); fallback to HF rebuild
- Redirect stdout to stderr during engine.load() in CLI to prevent
  build-pipeline prints from contaminating --format json output

E2E quality gates (187 tests, all passing):
- Merge test_run_e2e.py + test_run_hub_models_e2e.py into 3-tier
  quality gate: Feature (14) + Schema (35) + Inference (35)
- Add test_serve_e2e.py: 4-tier quality gate with 103 tests
- Cache-aware: tests use ~/.cache/winml/artifacts/ when available,
  falling back to HF model ID for uncached models
- Fix TestClient fixture: use context manager to trigger FastAPI lifespan
- Move shared test_image/runner fixtures to conftest.py
- Per-file timeout(3600) for e2e tests, global 300s unchanged

Unit tests:
- TestSanitizeNumpy: 6 tests for numpy scalar conversion
- TestFindBuildArtifacts: 7 tests including task filtering
- TestNormalizeNEROutput: verify NER output is JSON-serializable
@DingmaomaoBJTU DingmaomaoBJTU requested a review from a team as a code owner April 23, 2026 02:35

@timenick timenick left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline comments on 7 issues flagged against CLAUDE.md / tests/CLAUDE.md. Core fixes (_sanitize_numpy, _find_build_artifacts task filter, HF rebuild fallback) look correct — findings are mostly test-layout duplication plus a couple of behavior nits.

🤖 Generated with Claude Code

Comment thread tests/e2e/test_serve_e2e.py Outdated
Comment thread tests/e2e/test_serve_e2e.py Outdated
Comment thread tests/e2e/test_run_e2e.py Outdated
Comment thread src/winml/modelkit/commands/run.py
Comment thread tests/e2e/test_run_e2e.py
Comment thread src/winml/modelkit/inference/engine.py Outdated
Comment thread src/winml/modelkit/inference/engine.py Outdated
- Move shared e2e helpers (_HUB_DATA, _unique_pairs, _PAIRS, _pytest_id,
  _find_cache_dir, _resolve_model_arg, _SAMPLE_TEXT, _TEXT_BY_FIELD) to
  conftest.py as single source of truth (comments 1+2)
- Remove duplicate test_image fixture from test_serve_e2e.py (comment 1)
- Replace pytest.skip with pytest.xfail for missing-input cases so
  coverage gaps are visible, not hidden (comment 3)
- Fix redirect_stdout comment to match unconditional behavior (comment 4)
- Document why _extract_json is needed despite redirect_stdout — some
  C-extension code writes directly to fd 1, bypassing sys.stdout (comment 5)
- Raise FileNotFoundError when task=None and multiple different-task
  variants exist in cache dir, preventing silent wrong-model load (comment 6)
- Broaden except to (FileNotFoundError, JSONDecodeError, KeyError) so
  corrupt manifests trigger HF rebuild fallback (comment 7)

@timenick timenick left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review of the feedback commit. All 7 previous comments were addressed substantively (helpers moved to conftest, duplicate fixture removed, xfail with actionable reasons, docstrings, _find_build_artifacts task-ambiguity guard, broadened except). Unit tests pass locally. One new blocker was introduced by moving the helper into conftest.py — see inline.

🤖 Generated with Claude Code

Comment thread tests/e2e/conftest.py Outdated
pytest_id in conftest.py triggers pluggy PluginValidationError because
pytest treats any conftest function starting with pytest_ as a hook.

@timenick timenick left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DingmaomaoBJTU DingmaomaoBJTU merged commit c0da212 into main Apr 23, 2026
9 checks passed
@DingmaomaoBJTU DingmaomaoBJTU deleted the qiowu/serve_run_e2e branch April 23, 2026 10:14
ssss141414 pushed a commit that referenced this pull request Apr 27, 2026
## Summary

- Fix NER `numpy.float32` serialization error in serve path —
`_sanitize_numpy()` converts numpy scalars to Python types before
pydantic/JSON serialization
- Fix cache-key-prefixed build directory loading —
`_find_build_artifacts()` supports both plain (`model.onnx`) and
cache-prefixed (`{key}_model.onnx`) layouts, enabling `engine.load()` to
load directly from `~/.cache/winml/artifacts/`
- Fix task-mismatch cache loading — `_find_build_artifacts(task=)`
filters by manifest task, avoiding wrong ONNX model (e.g.
feature-extraction head for text-classification); falls back to HF
rebuild when no match found
- Redirect stdout→stderr during `engine.load()` in CLI to prevent
build-pipeline prints from contaminating `--format json` output
- Merge `test_run_e2e.py` + `test_run_hub_models_e2e.py` into 3-tier
quality gate (Feature 14 + Schema 35 + Inference 35 = 84 tests)
- Add `test_serve_e2e.py` 4-tier quality gate (103 tests)
- Cache-aware e2e: tests use `~/.cache/winml/artifacts/` when available,
per-file `timeout(3600)` for e2e
- 14 new unit tests for `_sanitize_numpy`, `_find_build_artifacts`, NER
output normalization

All 187 e2e tests passing (run 84 + serve 103).

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants