Skip to content

fix(cli): restore winml --help startup speed (6.1s → 0.44s)#398

Merged
DingmaomaoBJTU merged 2 commits into
mainfrom
fix/cli-startup-regression-clean
Apr 29, 2026
Merged

fix(cli): restore winml --help startup speed (6.1s → 0.44s)#398
DingmaomaoBJTU merged 2 commits into
mainfrom
fix/cli-startup-regression-clean

Conversation

@tezheng

@tezheng tezheng commented Apr 27, 2026

Copy link
Copy Markdown
Collaborator

Summary

Closes #400.

winml --help startup regressed from 0.13s → 6.1s in MVP v2 (#335). src/winml/modelkit/_warnings.py was eagerly importing torch.jit at module load via a try:/except ImportError: guard that was unreachable (torch is a hard dependency in pyproject.toml), dragging ~1.7s of torch into every CLI invocation.

This PR removes the eager import. Build-time TracerWarning suppression is unaffected — commands/build.py:1060-1061 already wraps export_onnx() in a blanket warnings.catch_warnings() + filterwarnings("ignore") that's strictly broader than the deleted filter.

Why CI didn't catch this

tests/test_import_time.py was a comprehensive regression test added in MVP v2 — but it lived at tests/ root, outside every enumerated path in .github/workflows/modelkit-ci.yml's test matrix. CI never ran it. The 37 failing tests it contained were invisible.

This PR moves it to tests/cli/test_import_time.py (a new top-level CLI-surface test category), adds tests/cli to the CI matrix, and updates tests/CLAUDE.md to require CI matrix sync for any new top-level test directory.

Verification

$ uv run pytest tests/cli/ tests/unit/sysinfo/ tests/unit/commands/ --no-cov -q
526 passed in 159s

$ time uv run winml --help    # warm
real    0m0.365s

$ python -X importtime -m winml.modelkit --help → torch loaded: NO

Bug-restore experiment (proving the test is diagnostic):

  • With fix applied: 91/91 tests/cli/ pass
  • With _warnings.py reverted to buggy state: 34 fail with FAIL: unexpected heavy modules: ['torch', 'torchgen']
  • After reverting back: 91/91 pass again

Changes (10 files)

File Change
src/winml/modelkit/_warnings.py Delete try: from torch.jit import TracerWarning block (the root cause)
src/winml/modelkit/onnx/__init__.py Standardize on _LAZY_IMPORTS dict pattern (matches 6 other subpackages)
src/winml/modelkit/sysinfo/device.py Add @lru_cache(maxsize=1) to _get_available_devices; return tuple[str, ...] for cache safety
.github/workflows/modelkit-ci.yml Add tests/cli to remaining matrix group
tests/CLAUDE.md Document tests/cli/ category + CI-matrix-sync rule; clarify "module dirs" rule
tests/cli/__init__.py Standard pkg marker (matches sibling test category dirs)
tests/test_import_time.pytests/cli/test_import_time.py Move + fix path resolution + fix tuple unpacking in test_lazy_imports_all_resolvable (was 0% coverage); remove TestCommandWithModel (out-of-scope feature pipeline tests)
tests/unit/commands/test_cli.pytests/cli/test_main.py Pure rename (verbatim move)
tests/unit/sysinfo/conftest.py Autouse fixture: clear lru_cache on device probes between tests
tests/unit/sysinfo/test_device.py Update assertions for the new tuple[str, ...] return type

Known caveat

winml export (direct CLI path, not via winml build) and winml.modelkit.build.hf:216 emit TracerWarning noise during ONNX export. Functional behavior unchanged — UX-only regression. Tracked in #421.

Related follow-up issues

🤖 Generated with Claude Code

@tezheng tezheng requested a review from a team as a code owner April 27, 2026 05:42
@tezheng

tezheng commented Apr 27, 2026

Copy link
Copy Markdown
Collaborator Author

Import-time breakdown (python -X importtime -m winml.modelkit --help)

Empirical evidence of the fix, captured live on this branch.

Headline numbers

Before fix After fix Delta
Modules imported 1,099 171 -928 (-84%)
winml.modelkit cumulative 1,761,266 µs (1.76s) 94,519 µs (94.5 ms) 18.6× faster
torch.jit cumulative 1,675,382 µs (1.68s) not loaded gone
winml --help wall-clock (warm) ~6.10s 0.365s ~16×

Top 10 depth-0 imports after fix

Package Cumulative (µs) Why
winml.modelkit 94,519 Package itself + importlib.metadata for __version__
winml.modelkit.cli 38,082 Click framework + logging
runpy 17,152 Python's -m machinery
site 16,790 Python startup, sys.path setup
encodings 5,065 Default codec setup
io 1,579 stdlib I/O
winml 1,149 Empty namespace package
_frozen_importlib_external 1,075 Interpreter bootstrap
glob 932 Used by LazyGroup.list_commands for command discovery
encodings.utf_8 842 UTF-8 codec

What used to be there (with the bug)

import time:      1177 |    1761266 | winml.modelkit
import time:      3702 |    1679947 |   winml.modelkit._warnings
import time:        40 |    1675382 |     torch.jit
import time:    339028 |    1675343 |       torch
import time:     89919 |     312823 |         torch._meta_registrations

The indentation tells the story: _warnings triggered torch.jit which triggered all of torch (~1.68s cumulative). After the fix, none of those lines exist.

Sanity probe

heavy = sorted(set(m.split('.')[0] for m in sys.modules
                   if m.startswith(('torch','transformers','optimum','diffusers','sklearn'))))
# heavy → [] (empty)

Zero heavy ML deps loaded for winml --help. PASS.

🤖 Generated with Claude Code

@tezheng tezheng force-pushed the fix/cli-startup-regression-clean branch 5 times, most recently from 8ae18ac to 4c2c6ff Compare April 27, 2026 09:28
@tezheng

tezheng commented Apr 28, 2026

Copy link
Copy Markdown
Collaborator Author

@qiowu — flagging for your awareness as the recent committer in this area. This PR also brings forward your TestDisabledCommands additions from #388 (now in tests/cli/test_main.py). No action required from you, just FYI.

@tezheng tezheng requested a review from DingmaomaoBJTU April 28, 2026 05:37
@tezheng

tezheng commented Apr 28, 2026

Copy link
Copy Markdown
Collaborator Author

Correction: tagging @DingmaomaoBJTU (Qiong) — earlier tag used the wrong handle. This PR brings forward your TestDisabledCommands additions from #388. FYI; no review action required unless you want to look.

@tezheng tezheng force-pushed the fix/cli-startup-regression-clean branch from 4c2c6ff to ef01bb8 Compare April 28, 2026 05:52
@tezheng tezheng requested a review from timenick April 28, 2026 06:28
@tezheng tezheng force-pushed the fix/cli-startup-regression-clean branch from ef01bb8 to 4fe240e Compare April 28, 2026 06:58

@timenick timenick left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing tests/cli/__init__.py

The new tests/cli/ directory has no __init__.py, but every other top-level test directory does — tests/__init__.py, tests/integration/__init__.py, tests/e2e/__init__.py, tests/regression/__init__.py, tests/assets/__init__.py. Verified via gh api repos/microsoft/WinML-ModelKit/contents/tests/cli?ref=fix/cli-startup-regression-clean — only the two test files exist.

This is inconsistent with project convention and may cause pytest's rootdir/import-mode behavior to diverge from peer category dirs. Please add tests/cli/__init__.py.


Nit (PR description): the body cites commands/build.py:1051-1052 for the broader warnings.catch_warnings() wrapper, but the actual lines are 1060-1061. Cosmetic only.

🤖 Generated with Claude Code

Comment thread tests/cli/test_import_time.py
Comment thread tests/cli/test_import_time.py Outdated
Comment thread tests/CLAUDE.md
Comment thread src/winml/modelkit/sysinfo/device.py Outdated
@DingmaomaoBJTU

Copy link
Copy Markdown
Collaborator

Code review

Found 1 issue:

  1. "Using a slow image processor" suppression is silently broken — the filter is attached only to transformers.pipelines.base, but this message is emitted by transformers.models.auto.image_processing_auto. Commit 324407d had explicitly added that second logger target to fix exactly this. Removing it means the noise will reappear for any caller of AutoImageProcessor.from_pretrained(use_fast=False) (e.g. image segmentation pipelines), even though it is still listed in _SUPPRESSED.

https://github.com/microsoft/WinML-ModelKit/blob/4fe240e5e1b3af92671160129a19e71cb52331be/src/winml/modelkit/_warnings.py#L55-L68

Fix: re-attach _PipelineNoiseFilter to transformers.models.auto.image_processing_auto as well, or move the attachment into a loop as 324407d originally did.


Also noted but below the threshold to block:

  • resolve_device returns the _get_available_devices() list by reference. With lru_cache now caching it, any external mutation corrupts future calls. Current callers are safe, but the docstring's "Callers must not mutate" is a soft contract. Returning a copy (list(available_devices)) or the already-used tuple pattern (same as _get_available_eps) would make it enforceable.
  • winml export / winml perf / build/hf.py have no catch_warnings wrapper, so TracerWarning will now surface in those paths — acknowledged in the PR as a known caveat.

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

_warnings.py was eagerly importing torch.jit at module load, dragging
all of torch (~1.7s) into every winml CLI invocation. The
try:/except ImportError: guard was unreachable since torch is a hard
dependency in this project. Removed the filter; build.py already wraps
export_onnx() in catch_warnings()+filterwarnings("ignore"), which is
strictly broader than the deleted TracerWarning-only filter.

Also:
- onnx/__init__.py: standardize on _LAZY_IMPORTS dict pattern, matching
  the other 6 subpackages and fixing 3 TestLazyImportsDict failures.
- sysinfo/device.py: add @lru_cache(maxsize=1) to _get_available_devices,
  mirroring the existing decorator on _get_available_eps. Fixes a CI
  flake where winml config -m <hf-model> --device <X> would re-run
  Windows WMI/PowerShell hardware probes on every resolve_device call,
  ballooning to 280s+ on cold runners. With the cache, the 2nd call is
  ~1M× faster (subprocess work happens once per process).
- tests/cli/: new top-level category for cross-cutting CLI-surface
  tests; moved test_import_time.py and test_main.py here.
- tests/cli/test_import_time.py: removed TestCommandWithModel — those
  tests invoke handler bodies (feature pipeline territory), not CLI
  surface. Per-command runtime import budgets belong in
  tests/unit/commands/ where mocks isolate dispatch from feature code.
- modelkit-ci.yml: include tests/cli in the "remaining" matrix group.
  Previously test_import_time.py at tests/ root sat outside every
  enumerated CI path, so the regression-detecting tests never ran.
- tests/CLAUDE.md: document the tests/cli/ category and require CI
  matrix updates when adding new top-level test directories.

Constraint: torch is a hard dependency, so try:/except ImportError on torch.* is unreachable
Constraint: Hardware doesn't change during a process lifetime; lru_cache pattern is already established by _get_available_eps
Rejected: Relocate TracerWarning filter into a torch-loaded code path | build.py's catch_warnings is strictly broader; duplication not worthwhile
Rejected: Change _get_available_devices to return frozenset/tuple for cache safety | larger refactor with public-API ripples; current callers only iterate
Confidence: high
Scope-risk: narrow
Directive: Never use try:/except ImportError on a required dependency at startup — use a function-scoped lazy import if you don't want to pay the cost. Never add a top-level tests/ category without also adding it to .github/workflows/modelkit-ci.yml's path matrix. When two probe helpers have parallel structure and identical "doesn't change at runtime" justification, they should both have @lru_cache.
Not-tested: winml export (direct CLI path) now emits TracerWarning noise (UX-only). Eager-probe before device check in resolve_device is now near-zero cost on cached repeat calls — could be cleaned up for clarity, not perf.
@tezheng tezheng force-pushed the fix/cli-startup-regression-clean branch from 4fe240e to df7f5f2 Compare April 29, 2026 06:51
@tezheng

tezheng commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator Author

@timenick Addressed all 5 of your points in df7f5f26:

Finding Action
Missing tests/cli/__init__.py Added, with license header matching siblings
winml.modelkit.optracing dropped from _LAZY_MODULES Restored (was in MVP v2's original list; package still exists on main)
tests/CLAUDE.md "Never module dirs" rule contradicts new tests/cli/ Reworded to "place tests that mirror a src/winml/modelkit/<module>/ directly under tests/" — clarifies the intent (forbids src-mirror dirs at root, allows category dirs)
TestCommandWithModel deletion comment promised tests in tests/unit/commands/ (none added) Softened to "If per-command runtime budgets become a concern, they belong in a feature-test file" — no false promise
_get_available_devices returning mutable list[str] despite @lru_cache Changed return type to tuple[str, ...] (immutable, cache-safe). resolve_device now does list(_get_available_devices()) to preserve its public list[str] contract; tests in test_device.py updated to assert against tuples

Also fixed PR body: commands/build.py:1051-10521060-1061 (cosmetic, but you were right).

Local verification: uv run pytest tests/cli/ tests/unit/sysinfo/ tests/unit/commands/ --no-cov -q526 passed, 0 failed.

🤖 Generated with Claude Code

@tezheng

tezheng commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator Author

@DingmaomaoBJTU Thanks for the approve!

On the type-signature nit (mutable list[str] from cached fn): addressed in df7f5f26_get_available_devices now returns tuple[str, ...]. Same direction Zhipeng suggested.

On the broken `Using a slow image processor` filter: good catch, but out of scope for this PR — that's a regression from a different cleanup (commit 324407d's second .addFilter() was lost somewhere) and predates the work here. I'll file it as a separate issue so the fix stays focused on what's already in this PR. Tracking in: (will add link once filed). Confirmed the bug independently — _PipelineNoiseFilter is only attached to transformers.pipelines.base whereas the message originates in transformers.models.auto.image_processing_auto.

On the winml export / winml perf / build/hf.py TracerWarning noise: acknowledged as a known caveat in the PR body — same out-of-scope follow-up class. Same fix for both: wrap once inside export_pytorch() so all callers get coverage.

🤖 Generated with Claude Code

@tezheng

tezheng commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator Author

Filed the two out-of-scope follow-ups so they don't get lost:

Both reference this PR for context.

@DingmaomaoBJTU DingmaomaoBJTU merged commit 5ebdb90 into main Apr 29, 2026
9 checks passed
@DingmaomaoBJTU DingmaomaoBJTU deleted the fix/cli-startup-regression-clean branch April 29, 2026 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: winml.modelkit import pulls torch unconditionally, breaking 47 import-budget tests

3 participants