fix(cli): restore winml --help startup speed (6.1s → 0.44s) by tezheng · Pull Request #398 · microsoft/winml-cli

tezheng · 2026-04-27T05:42:53Z

Summary

Closes #400.

winml --help startup regressed from 0.13s → 6.1s in MVP v2 (#335). src/winml/modelkit/_warnings.py was eagerly importing torch.jit at module load via a try:/except ImportError: guard that was unreachable (torch is a hard dependency in pyproject.toml), dragging ~1.7s of torch into every CLI invocation.

This PR removes the eager import. Build-time TracerWarning suppression is unaffected — commands/build.py:1060-1061 already wraps export_onnx() in a blanket warnings.catch_warnings() + filterwarnings("ignore") that's strictly broader than the deleted filter.

Why CI didn't catch this

tests/test_import_time.py was a comprehensive regression test added in MVP v2 — but it lived at tests/ root, outside every enumerated path in .github/workflows/modelkit-ci.yml's test matrix. CI never ran it. The 37 failing tests it contained were invisible.

This PR moves it to tests/cli/test_import_time.py (a new top-level CLI-surface test category), adds tests/cli to the CI matrix, and updates tests/CLAUDE.md to require CI matrix sync for any new top-level test directory.

Verification

$ uv run pytest tests/cli/ tests/unit/sysinfo/ tests/unit/commands/ --no-cov -q
526 passed in 159s

$ time uv run winml --help    # warm
real    0m0.365s

$ python -X importtime -m winml.modelkit --help → torch loaded: NO

Bug-restore experiment (proving the test is diagnostic):

With fix applied: 91/91 tests/cli/ pass
With _warnings.py reverted to buggy state: 34 fail with FAIL: unexpected heavy modules: ['torch', 'torchgen']
After reverting back: 91/91 pass again

Changes (10 files)

File	Change
`src/winml/modelkit/_warnings.py`	Delete `try: from torch.jit import TracerWarning` block (the root cause)
`src/winml/modelkit/onnx/__init__.py`	Standardize on `_LAZY_IMPORTS` dict pattern (matches 6 other subpackages)
`src/winml/modelkit/sysinfo/device.py`	Add `@lru_cache(maxsize=1)` to `_get_available_devices`; return `tuple[str, ...]` for cache safety
`.github/workflows/modelkit-ci.yml`	Add `tests/cli` to `remaining` matrix group
`tests/CLAUDE.md`	Document `tests/cli/` category + CI-matrix-sync rule; clarify "module dirs" rule
`tests/cli/__init__.py`	Standard pkg marker (matches sibling test category dirs)
`tests/test_import_time.py` → `tests/cli/test_import_time.py`	Move + fix path resolution + fix tuple unpacking in `test_lazy_imports_all_resolvable` (was 0% coverage); remove `TestCommandWithModel` (out-of-scope feature pipeline tests)
`tests/unit/commands/test_cli.py` → `tests/cli/test_main.py`	Pure rename (verbatim move)
`tests/unit/sysinfo/conftest.py`	Autouse fixture: clear `lru_cache` on device probes between tests
`tests/unit/sysinfo/test_device.py`	Update assertions for the new `tuple[str, ...]` return type

Known caveat

winml export (direct CLI path, not via winml build) and winml.modelkit.build.hf:216 emit TracerWarning noise during ONNX export. Functional behavior unchanged — UX-only regression. Tracked in #421.

Related follow-up issues

Bug: 'Using a slow image processor' suppression silently broken in _warnings.py #420 — _PipelineNoiseFilter not attached to image_processing_auto logger (raised by @DingmaomaoBJTU in code review; pre-existing bug, separate cleanup)
Bug: TracerWarning leaks to stderr on winml export, build/hf.py paths #421 — TracerWarning leaks on direct winml export / build/hf.py paths (the known caveat)

🤖 Generated with Claude Code

tezheng · 2026-04-27T05:43:15Z

Import-time breakdown (`python -X importtime -m winml.modelkit --help`)

Empirical evidence of the fix, captured live on this branch.

Headline numbers

	Before fix	After fix	Delta
Modules imported	1,099	171	-928 (-84%)
`winml.modelkit` cumulative	1,761,266 µs (1.76s)	94,519 µs (94.5 ms)	18.6× faster
`torch.jit` cumulative	1,675,382 µs (1.68s)	not loaded	gone
`winml --help` wall-clock (warm)	~6.10s	0.365s	~16×

Top 10 depth-0 imports after fix

Package	Cumulative (µs)	Why
`winml.modelkit`	94,519	Package itself + `importlib.metadata` for `__version__`
`winml.modelkit.cli`	38,082	Click framework + logging
`runpy`	17,152	Python's `-m` machinery
`site`	16,790	Python startup, `sys.path` setup
`encodings`	5,065	Default codec setup
`io`	1,579	stdlib I/O
`winml`	1,149	Empty namespace package
`_frozen_importlib_external`	1,075	Interpreter bootstrap
`glob`	932	Used by `LazyGroup.list_commands` for command discovery
`encodings.utf_8`	842	UTF-8 codec

What used to be there (with the bug)

import time:      1177 |    1761266 | winml.modelkit
import time:      3702 |    1679947 |   winml.modelkit._warnings
import time:        40 |    1675382 |     torch.jit
import time:    339028 |    1675343 |       torch
import time:     89919 |     312823 |         torch._meta_registrations

The indentation tells the story: _warnings triggered torch.jit which triggered all of torch (~1.68s cumulative). After the fix, none of those lines exist.

Sanity probe

heavy = sorted(set(m.split('.')[0] for m in sys.modules
                   if m.startswith(('torch','transformers','optimum','diffusers','sklearn'))))
# heavy → [] (empty)

Zero heavy ML deps loaded for winml --help. PASS.

🤖 Generated with Claude Code

tezheng · 2026-04-28T05:37:12Z

@qiowu — flagging for your awareness as the recent committer in this area. This PR also brings forward your TestDisabledCommands additions from #388 (now in tests/cli/test_main.py). No action required from you, just FYI.

tezheng · 2026-04-28T05:37:39Z

Correction: tagging @DingmaomaoBJTU (Qiong) — earlier tag used the wrong handle. This PR brings forward your TestDisabledCommands additions from #388. FYI; no review action required unless you want to look.

timenick

Missing tests/cli/__init__.py

The new tests/cli/ directory has no __init__.py, but every other top-level test directory does — tests/__init__.py, tests/integration/__init__.py, tests/e2e/__init__.py, tests/regression/__init__.py, tests/assets/__init__.py. Verified via gh api repos/microsoft/WinML-ModelKit/contents/tests/cli?ref=fix/cli-startup-regression-clean — only the two test files exist.

This is inconsistent with project convention and may cause pytest's rootdir/import-mode behavior to diverge from peer category dirs. Please add tests/cli/__init__.py.

Nit (PR description): the body cites commands/build.py:1051-1052 for the broader warnings.catch_warnings() wrapper, but the actual lines are 1060-1061. Cosmetic only.

🤖 Generated with Claude Code

DingmaomaoBJTU · 2026-04-29T04:29:37Z

Code review

Found 1 issue:

"Using a slow image processor" suppression is silently broken — the filter is attached only to transformers.pipelines.base, but this message is emitted by transformers.models.auto.image_processing_auto. Commit 324407d had explicitly added that second logger target to fix exactly this. Removing it means the noise will reappear for any caller of AutoImageProcessor.from_pretrained(use_fast=False) (e.g. image segmentation pipelines), even though it is still listed in _SUPPRESSED.

https://github.com/microsoft/WinML-ModelKit/blob/4fe240e5e1b3af92671160129a19e71cb52331be/src/winml/modelkit/_warnings.py#L55-L68

Fix: re-attach _PipelineNoiseFilter to transformers.models.auto.image_processing_auto as well, or move the attachment into a loop as 324407d originally did.

Also noted but below the threshold to block:

resolve_device returns the _get_available_devices() list by reference. With lru_cache now caching it, any external mutation corrupts future calls. Current callers are safe, but the docstring's "Callers must not mutate" is a soft contract. Returning a copy (list(available_devices)) or the already-used tuple pattern (same as _get_available_eps) would make it enforceable.
winml export / winml perf / build/hf.py have no catch_warnings wrapper, so TracerWarning will now surface in those paths — acknowledged in the PR as a known caveat.

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

_warnings.py was eagerly importing torch.jit at module load, dragging all of torch (~1.7s) into every winml CLI invocation. The try:/except ImportError: guard was unreachable since torch is a hard dependency in this project. Removed the filter; build.py already wraps export_onnx() in catch_warnings()+filterwarnings("ignore"), which is strictly broader than the deleted TracerWarning-only filter. Also: - onnx/__init__.py: standardize on _LAZY_IMPORTS dict pattern, matching the other 6 subpackages and fixing 3 TestLazyImportsDict failures. - sysinfo/device.py: add @lru_cache(maxsize=1) to _get_available_devices, mirroring the existing decorator on _get_available_eps. Fixes a CI flake where winml config -m <hf-model> --device <X> would re-run Windows WMI/PowerShell hardware probes on every resolve_device call, ballooning to 280s+ on cold runners. With the cache, the 2nd call is ~1M× faster (subprocess work happens once per process). - tests/cli/: new top-level category for cross-cutting CLI-surface tests; moved test_import_time.py and test_main.py here. - tests/cli/test_import_time.py: removed TestCommandWithModel — those tests invoke handler bodies (feature pipeline territory), not CLI surface. Per-command runtime import budgets belong in tests/unit/commands/ where mocks isolate dispatch from feature code. - modelkit-ci.yml: include tests/cli in the "remaining" matrix group. Previously test_import_time.py at tests/ root sat outside every enumerated CI path, so the regression-detecting tests never ran. - tests/CLAUDE.md: document the tests/cli/ category and require CI matrix updates when adding new top-level test directories. Constraint: torch is a hard dependency, so try:/except ImportError on torch.* is unreachable Constraint: Hardware doesn't change during a process lifetime; lru_cache pattern is already established by _get_available_eps Rejected: Relocate TracerWarning filter into a torch-loaded code path | build.py's catch_warnings is strictly broader; duplication not worthwhile Rejected: Change _get_available_devices to return frozenset/tuple for cache safety | larger refactor with public-API ripples; current callers only iterate Confidence: high Scope-risk: narrow Directive: Never use try:/except ImportError on a required dependency at startup — use a function-scoped lazy import if you don't want to pay the cost. Never add a top-level tests/ category without also adding it to .github/workflows/modelkit-ci.yml's path matrix. When two probe helpers have parallel structure and identical "doesn't change at runtime" justification, they should both have @lru_cache. Not-tested: winml export (direct CLI path) now emits TracerWarning noise (UX-only). Eager-probe before device check in resolve_device is now near-zero cost on cached repeat calls — could be cleaned up for clarity, not perf.

tezheng · 2026-04-29T06:52:43Z

@timenick Addressed all 5 of your points in df7f5f26:

Finding	Action
Missing `tests/cli/__init__.py`	Added, with license header matching siblings
`winml.modelkit.optracing` dropped from `_LAZY_MODULES`	Restored (was in MVP v2's original list; package still exists on `main`)
`tests/CLAUDE.md` "Never module dirs" rule contradicts new `tests/cli/`	Reworded to "place tests that mirror a `src/winml/modelkit/<module>/` directly under `tests/`" — clarifies the intent (forbids `src`-mirror dirs at root, allows category dirs)
`TestCommandWithModel` deletion comment promised tests in `tests/unit/commands/` (none added)	Softened to "If per-command runtime budgets become a concern, they belong in a feature-test file" — no false promise
`_get_available_devices` returning mutable `list[str]` despite `@lru_cache`	Changed return type to `tuple[str, ...]` (immutable, cache-safe). `resolve_device` now does `list(_get_available_devices())` to preserve its public `list[str]` contract; tests in `test_device.py` updated to assert against tuples

Also fixed PR body: commands/build.py:1051-1052 → 1060-1061 (cosmetic, but you were right).

Local verification: uv run pytest tests/cli/ tests/unit/sysinfo/ tests/unit/commands/ --no-cov -q → 526 passed, 0 failed.

🤖 Generated with Claude Code

tezheng · 2026-04-29T06:52:45Z

@DingmaomaoBJTU Thanks for the approve!

On the type-signature nit (mutable list[str] from cached fn): addressed in df7f5f26 — _get_available_devices now returns tuple[str, ...]. Same direction Zhipeng suggested.

On the broken `Using a slow image processor` filter: good catch, but out of scope for this PR — that's a regression from a different cleanup (commit 324407d's second .addFilter() was lost somewhere) and predates the work here. I'll file it as a separate issue so the fix stays focused on what's already in this PR. Tracking in: (will add link once filed). Confirmed the bug independently — _PipelineNoiseFilter is only attached to transformers.pipelines.base whereas the message originates in transformers.models.auto.image_processing_auto.

On the winml export / winml perf / build/hf.py TracerWarning noise: acknowledged as a known caveat in the PR body — same out-of-scope follow-up class. Same fix for both: wrap once inside export_pytorch() so all callers get coverage.

🤖 Generated with Claude Code

tezheng · 2026-04-29T06:53:59Z

Filed the two out-of-scope follow-ups so they don't get lost:

Bug: 'Using a slow image processor' suppression silently broken in _warnings.py #420 — `Using a slow image processor` suppression broken (Qiong's main finding)
Bug: TracerWarning leaks to stderr on winml export, build/hf.py paths #421 — `TracerWarning` leaks on direct `winml export` / `build/hf.py` paths (the known caveat)

Both reference this PR for context.

tezheng requested a review from a team as a code owner April 27, 2026 05:42

tezheng force-pushed the fix/cli-startup-regression-clean branch 5 times, most recently from 8ae18ac to 4c2c6ff Compare April 27, 2026 09:28

tezheng requested a review from DingmaomaoBJTU April 28, 2026 05:37

tezheng force-pushed the fix/cli-startup-regression-clean branch from 4c2c6ff to ef01bb8 Compare April 28, 2026 05:52

tezheng requested a review from timenick April 28, 2026 06:28

tezheng force-pushed the fix/cli-startup-regression-clean branch from ef01bb8 to 4fe240e Compare April 28, 2026 06:58

timenick reviewed Apr 29, 2026

View reviewed changes

Comment thread tests/cli/test_import_time.py

Comment thread tests/cli/test_import_time.py Outdated

Comment thread tests/CLAUDE.md

Comment thread src/winml/modelkit/sysinfo/device.py Outdated

DingmaomaoBJTU approved these changes Apr 29, 2026

View reviewed changes

tezheng force-pushed the fix/cli-startup-regression-clean branch from 4fe240e to df7f5f2 Compare April 29, 2026 06:51

This was referenced Apr 29, 2026

Bug: 'Using a slow image processor' suppression silently broken in _warnings.py #420

Closed

Bug: TracerWarning leaks to stderr on winml export, build/hf.py paths #421

Closed

DingmaomaoBJTU approved these changes Apr 29, 2026

View reviewed changes

Merge branch 'main' into fix/cli-startup-regression-clean

bc9a8a8

timenick approved these changes Apr 29, 2026

View reviewed changes

DingmaomaoBJTU merged commit 5ebdb90 into main Apr 29, 2026
9 checks passed

DingmaomaoBJTU deleted the fix/cli-startup-regression-clean branch April 29, 2026 09:16

DingmaomaoBJTU mentioned this pull request May 6, 2026

bug: winml.modelkit import pulls torch unconditionally, breaking 47 import-budget tests #400

Closed

Copilot AI mentioned this pull request May 7, 2026

Centralize ONNX export warning suppression in export_pytorch() #460

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(cli): restore winml --help startup speed (6.1s → 0.44s)#398

fix(cli): restore winml --help startup speed (6.1s → 0.44s)#398
DingmaomaoBJTU merged 2 commits into
mainfrom
fix/cli-startup-regression-clean

tezheng commented Apr 27, 2026 •

edited

Loading

Uh oh!

tezheng commented Apr 27, 2026

Uh oh!

tezheng commented Apr 28, 2026

Uh oh!

tezheng commented Apr 28, 2026

Uh oh!

timenick left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DingmaomaoBJTU commented Apr 29, 2026

Uh oh!

tezheng commented Apr 29, 2026

Uh oh!

tezheng commented Apr 29, 2026

Uh oh!

tezheng commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

tezheng commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why CI didn't catch this

Verification

Changes (10 files)

Known caveat

Related follow-up issues

Uh oh!

tezheng commented Apr 27, 2026

Import-time breakdown (python -X importtime -m winml.modelkit --help)

Headline numbers

Top 10 depth-0 imports after fix

What used to be there (with the bug)

Sanity probe

Uh oh!

tezheng commented Apr 28, 2026

Uh oh!

tezheng commented Apr 28, 2026

Uh oh!

timenick left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DingmaomaoBJTU commented Apr 29, 2026

Code review

Uh oh!

tezheng commented Apr 29, 2026

Uh oh!

tezheng commented Apr 29, 2026

Uh oh!

tezheng commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tezheng commented Apr 27, 2026 •

edited

Loading

Import-time breakdown (`python -X importtime -m winml.modelkit --help`)