fix(cli): restore winml --help startup speed (6.1s → 0.44s)#398
Conversation
Import-time breakdown (
|
| Before fix | After fix | Delta | |
|---|---|---|---|
| Modules imported | 1,099 | 171 | -928 (-84%) |
winml.modelkit cumulative |
1,761,266 µs (1.76s) | 94,519 µs (94.5 ms) | 18.6× faster |
torch.jit cumulative |
1,675,382 µs (1.68s) | not loaded | gone |
winml --help wall-clock (warm) |
~6.10s | 0.365s | ~16× |
Top 10 depth-0 imports after fix
| Package | Cumulative (µs) | Why |
|---|---|---|
winml.modelkit |
94,519 | Package itself + importlib.metadata for __version__ |
winml.modelkit.cli |
38,082 | Click framework + logging |
runpy |
17,152 | Python's -m machinery |
site |
16,790 | Python startup, sys.path setup |
encodings |
5,065 | Default codec setup |
io |
1,579 | stdlib I/O |
winml |
1,149 | Empty namespace package |
_frozen_importlib_external |
1,075 | Interpreter bootstrap |
glob |
932 | Used by LazyGroup.list_commands for command discovery |
encodings.utf_8 |
842 | UTF-8 codec |
What used to be there (with the bug)
import time: 1177 | 1761266 | winml.modelkit
import time: 3702 | 1679947 | winml.modelkit._warnings
import time: 40 | 1675382 | torch.jit
import time: 339028 | 1675343 | torch
import time: 89919 | 312823 | torch._meta_registrations
The indentation tells the story: _warnings triggered torch.jit which triggered all of torch (~1.68s cumulative). After the fix, none of those lines exist.
Sanity probe
heavy = sorted(set(m.split('.')[0] for m in sys.modules
if m.startswith(('torch','transformers','optimum','diffusers','sklearn'))))
# heavy → [] (empty)Zero heavy ML deps loaded for winml --help. PASS.
🤖 Generated with Claude Code
8ae18ac to
4c2c6ff
Compare
|
Correction: tagging @DingmaomaoBJTU (Qiong) — earlier tag used the wrong handle. This PR brings forward your TestDisabledCommands additions from #388. FYI; no review action required unless you want to look. |
4c2c6ff to
ef01bb8
Compare
ef01bb8 to
4fe240e
Compare
timenick
left a comment
There was a problem hiding this comment.
Missing tests/cli/__init__.py
The new tests/cli/ directory has no __init__.py, but every other top-level test directory does — tests/__init__.py, tests/integration/__init__.py, tests/e2e/__init__.py, tests/regression/__init__.py, tests/assets/__init__.py. Verified via gh api repos/microsoft/WinML-ModelKit/contents/tests/cli?ref=fix/cli-startup-regression-clean — only the two test files exist.
This is inconsistent with project convention and may cause pytest's rootdir/import-mode behavior to diverge from peer category dirs. Please add tests/cli/__init__.py.
Nit (PR description): the body cites commands/build.py:1051-1052 for the broader warnings.catch_warnings() wrapper, but the actual lines are 1060-1061. Cosmetic only.
🤖 Generated with Claude Code
Code reviewFound 1 issue:
Fix: re-attach Also noted but below the threshold to block:
🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
_warnings.py was eagerly importing torch.jit at module load, dragging
all of torch (~1.7s) into every winml CLI invocation. The
try:/except ImportError: guard was unreachable since torch is a hard
dependency in this project. Removed the filter; build.py already wraps
export_onnx() in catch_warnings()+filterwarnings("ignore"), which is
strictly broader than the deleted TracerWarning-only filter.
Also:
- onnx/__init__.py: standardize on _LAZY_IMPORTS dict pattern, matching
the other 6 subpackages and fixing 3 TestLazyImportsDict failures.
- sysinfo/device.py: add @lru_cache(maxsize=1) to _get_available_devices,
mirroring the existing decorator on _get_available_eps. Fixes a CI
flake where winml config -m <hf-model> --device <X> would re-run
Windows WMI/PowerShell hardware probes on every resolve_device call,
ballooning to 280s+ on cold runners. With the cache, the 2nd call is
~1M× faster (subprocess work happens once per process).
- tests/cli/: new top-level category for cross-cutting CLI-surface
tests; moved test_import_time.py and test_main.py here.
- tests/cli/test_import_time.py: removed TestCommandWithModel — those
tests invoke handler bodies (feature pipeline territory), not CLI
surface. Per-command runtime import budgets belong in
tests/unit/commands/ where mocks isolate dispatch from feature code.
- modelkit-ci.yml: include tests/cli in the "remaining" matrix group.
Previously test_import_time.py at tests/ root sat outside every
enumerated CI path, so the regression-detecting tests never ran.
- tests/CLAUDE.md: document the tests/cli/ category and require CI
matrix updates when adding new top-level test directories.
Constraint: torch is a hard dependency, so try:/except ImportError on torch.* is unreachable
Constraint: Hardware doesn't change during a process lifetime; lru_cache pattern is already established by _get_available_eps
Rejected: Relocate TracerWarning filter into a torch-loaded code path | build.py's catch_warnings is strictly broader; duplication not worthwhile
Rejected: Change _get_available_devices to return frozenset/tuple for cache safety | larger refactor with public-API ripples; current callers only iterate
Confidence: high
Scope-risk: narrow
Directive: Never use try:/except ImportError on a required dependency at startup — use a function-scoped lazy import if you don't want to pay the cost. Never add a top-level tests/ category without also adding it to .github/workflows/modelkit-ci.yml's path matrix. When two probe helpers have parallel structure and identical "doesn't change at runtime" justification, they should both have @lru_cache.
Not-tested: winml export (direct CLI path) now emits TracerWarning noise (UX-only). Eager-probe before device check in resolve_device is now near-zero cost on cached repeat calls — could be cleaned up for clarity, not perf.
4fe240e to
df7f5f2
Compare
|
@timenick Addressed all 5 of your points in
Also fixed PR body: Local verification: 🤖 Generated with Claude Code |
|
@DingmaomaoBJTU Thanks for the approve! On the type-signature nit (mutable On the broken `Using a slow image processor` filter: good catch, but out of scope for this PR — that's a regression from a different cleanup (commit 324407d's second On the 🤖 Generated with Claude Code |
|
Filed the two out-of-scope follow-ups so they don't get lost:
Both reference this PR for context. |
Summary
Closes #400.
winml --helpstartup regressed from 0.13s → 6.1s in MVP v2 (#335).src/winml/modelkit/_warnings.pywas eagerly importingtorch.jitat module load via atry:/except ImportError:guard that was unreachable (torch is a hard dependency inpyproject.toml), dragging ~1.7s of torch into every CLI invocation.This PR removes the eager import. Build-time
TracerWarningsuppression is unaffected —commands/build.py:1060-1061already wrapsexport_onnx()in a blanketwarnings.catch_warnings()+filterwarnings("ignore")that's strictly broader than the deleted filter.Why CI didn't catch this
tests/test_import_time.pywas a comprehensive regression test added in MVP v2 — but it lived attests/root, outside every enumerated path in.github/workflows/modelkit-ci.yml's test matrix. CI never ran it. The 37 failing tests it contained were invisible.This PR moves it to
tests/cli/test_import_time.py(a new top-level CLI-surface test category), addstests/clito the CI matrix, and updatestests/CLAUDE.mdto require CI matrix sync for any new top-level test directory.Verification
Bug-restore experiment (proving the test is diagnostic):
tests/cli/pass_warnings.pyreverted to buggy state: 34 fail withFAIL: unexpected heavy modules: ['torch', 'torchgen']Changes (10 files)
src/winml/modelkit/_warnings.pytry: from torch.jit import TracerWarningblock (the root cause)src/winml/modelkit/onnx/__init__.py_LAZY_IMPORTSdict pattern (matches 6 other subpackages)src/winml/modelkit/sysinfo/device.py@lru_cache(maxsize=1)to_get_available_devices; returntuple[str, ...]for cache safety.github/workflows/modelkit-ci.ymltests/clitoremainingmatrix grouptests/CLAUDE.mdtests/cli/category + CI-matrix-sync rule; clarify "module dirs" ruletests/cli/__init__.pytests/test_import_time.py→tests/cli/test_import_time.pytest_lazy_imports_all_resolvable(was 0% coverage); removeTestCommandWithModel(out-of-scope feature pipeline tests)tests/unit/commands/test_cli.py→tests/cli/test_main.pytests/unit/sysinfo/conftest.pylru_cacheon device probes between teststests/unit/sysinfo/test_device.pytuple[str, ...]return typeKnown caveat
winml export(direct CLI path, not viawinml build) andwinml.modelkit.build.hf:216emitTracerWarningnoise during ONNX export. Functional behavior unchanged — UX-only regression. Tracked in #421.Related follow-up issues
_PipelineNoiseFilternot attached toimage_processing_autologger (raised by @DingmaomaoBJTU in code review; pre-existing bug, separate cleanup)TracerWarningleaks on directwinml export/build/hf.pypaths (the known caveat)🤖 Generated with Claude Code