Skip to content

Commit ef01bb8

Browse files
author
Zheng Te
committed
fix(cli): restore winml --help startup speed (6.1s → 0.44s)
_warnings.py was eagerly importing torch.jit at module load, dragging all of torch (~1.7s) into every winml CLI invocation. The try:/except ImportError: guard was unreachable since torch is a hard dependency in this project. Removed the filter; build.py already wraps export_onnx() in catch_warnings()+filterwarnings("ignore"), which is strictly broader than the deleted TracerWarning-only filter. Also: - onnx/__init__.py: standardize on _LAZY_IMPORTS dict pattern, matching the other 6 subpackages and fixing 3 TestLazyImportsDict failures. - sysinfo/device.py: add @lru_cache(maxsize=1) to _get_available_devices, mirroring the existing decorator on _get_available_eps. Fixes a CI flake where winml config -m <hf-model> --device <X> would re-run Windows WMI/PowerShell hardware probes on every resolve_device call, ballooning to 280s+ on cold runners. With the cache, the 2nd call is ~1M× faster (subprocess work happens once per process). - tests/cli/: new top-level category for cross-cutting CLI-surface tests; moved test_import_time.py and test_main.py here. - tests/cli/test_import_time.py: removed TestCommandWithModel — those tests invoke handler bodies (feature pipeline territory), not CLI surface. Per-command runtime import budgets belong in tests/unit/commands/ where mocks isolate dispatch from feature code. - modelkit-ci.yml: include tests/cli in the "remaining" matrix group. Previously test_import_time.py at tests/ root sat outside every enumerated CI path, so the regression-detecting tests never ran. - tests/CLAUDE.md: document the tests/cli/ category and require CI matrix updates when adding new top-level test directories. Constraint: torch is a hard dependency, so try:/except ImportError on torch.* is unreachable Constraint: Hardware doesn't change during a process lifetime; lru_cache pattern is already established by _get_available_eps Rejected: Relocate TracerWarning filter into a torch-loaded code path | build.py's catch_warnings is strictly broader; duplication not worthwhile Rejected: Change _get_available_devices to return frozenset/tuple for cache safety | larger refactor with public-API ripples; current callers only iterate Confidence: high Scope-risk: narrow Directive: Never use try:/except ImportError on a required dependency at startup — use a function-scoped lazy import if you don't want to pay the cost. Never add a top-level tests/ category without also adding it to .github/workflows/modelkit-ci.yml's path matrix. When two probe helpers have parallel structure and identical "doesn't change at runtime" justification, they should both have @lru_cache. Not-tested: winml export (direct CLI path) now emits TracerWarning noise (UX-only). Eager-probe before device check in resolve_device is now near-zero cost on cached repeat calls — could be cleaned up for clarity, not perf.
1 parent 8cca1fd commit ef01bb8

7 files changed

Lines changed: 63 additions & 89 deletions

File tree

.github/workflows/modelkit-ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ jobs:
3939
paths: >-
4040
tests/unit/core tests/unit/onnx tests/unit/cache
4141
tests/unit/utils tests/unit/sysinfo tests/unit/inspect
42-
tests/unit/optracing tests/regression
42+
tests/unit/optracing tests/regression tests/cli
4343
4444
name: test (${{ matrix.group }})
4545

src/winml/modelkit/_warnings.py

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -78,14 +78,13 @@ def filter(self, record: logging.LogRecord) -> bool:
7878
for _cat in (FutureWarning, DeprecationWarning, UserWarning):
7979
warnings.filterwarnings("ignore", category=_cat, module=r"torch\..*")
8080

81-
# TracerWarning (from torch.jit, inherits Warning not UserWarning)
82-
# fires during ONNX export tracing — safe to suppress in both torch and transformers
83-
try:
84-
from torch.jit import TracerWarning
85-
86-
warnings.filterwarnings("ignore", category=TracerWarning)
87-
except ImportError:
88-
pass # torch not installed
81+
# NOTE: TracerWarning (from torch.jit) is intentionally NOT filtered here.
82+
# Importing torch.jit at startup would pull all of torch (~1.7s) into
83+
# `winml --help` and violate the CLI import budget (tests/cli/test_import_time.py).
84+
# During ONNX export, build.py already wraps the export call in
85+
# `warnings.catch_warnings()` + `filterwarnings("ignore")`, which is strictly
86+
# broader than a TracerWarning-only filter. Direct callers of export_pytorch()
87+
# that want the same suppression can apply it locally at the call site.
8988

9089
# Diffusers
9190
warnings.filterwarnings(

src/winml/modelkit/onnx/__init__.py

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -47,16 +47,24 @@
4747
]
4848

4949

50+
_LAZY_IMPORTS: dict[str, tuple[str, str]] = {
51+
"is_compiled_onnx": (".detection", "is_compiled_onnx"),
52+
"is_quantized_onnx": (".detection", "is_quantized_onnx"),
53+
}
54+
55+
5056
def __getattr__(name: str):
5157
"""Lazy-load detection module to avoid circular import with compiler."""
52-
if name in ("is_compiled_onnx", "is_quantized_onnx"):
53-
from .detection import is_compiled_onnx, is_quantized_onnx
58+
if name in _LAZY_IMPORTS:
59+
module_path, attr_name = _LAZY_IMPORTS[name]
60+
import importlib
5461

55-
globals()["is_compiled_onnx"] = is_compiled_onnx
56-
globals()["is_quantized_onnx"] = is_quantized_onnx
57-
return globals()[name]
62+
mod = importlib.import_module(module_path, __name__)
63+
val = getattr(mod, attr_name)
64+
globals()[name] = val
65+
return val
5866
raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
5967

6068

6169
def __dir__() -> list[str]:
62-
return __all__
70+
return list(set(list(globals()) + __all__))

src/winml/modelkit/sysinfo/device.py

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,13 +74,24 @@ def get_ep_device_map() -> dict[str, str]:
7474
return dict(_EP_DEVICE_MAP)
7575

7676

77+
@functools.lru_cache(maxsize=1)
7778
def _get_available_devices() -> list[str]:
78-
"""Return prioritized list of available devices.
79+
"""Return prioritized list of available devices (cached).
7980
8081
Priority: NPU > GPU > CPU.
8182
Always includes "cpu" as fallback.
8283
Uses SysInfo hardware classes for detection.
8384
85+
Hardware does not change during a process lifetime, so this result is
86+
cached via lru_cache (mirrors ``_get_available_eps``). Without this
87+
cache, ``resolve_device`` calls within a single CLI invocation each
88+
re-run Windows WMI/PowerShell subprocesses (~1.2s/call locally,
89+
5-10x slower on cold CI), which on Windows CI runners has caused
90+
user-facing commands like ``winml config -m <model> --device npu``
91+
to balloon past 280s.
92+
93+
Callers must not mutate the returned list (it is shared across calls).
94+
8495
This is an internal helper for :func:`resolve_device` and should not
8596
be called directly by external code.
8697

tests/CLAUDE.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,16 @@ Reference: [`/docs/pytest-best-practices.md`](/docs/pytest-best-practices.md)
88

99
- Place unit tests under `tests/unit/<module>/` mirroring `src/winml/modelkit/<module>/`
1010
- Place integration tests under `tests/integration/`, e2e under `tests/e2e/`
11+
- Place cross-cutting CLI-surface tests (startup, import budget, arg parsing,
12+
command discovery, version/help output) under `tests/cli/` — these don't
13+
mirror any single `src/` module, so they don't fit under `tests/unit/<module>/`
1114
- Put shared fixtures in the narrowest `conftest.py` that covers all consumers
1215

1316
## Never
1417

1518
- Create module directories directly under `tests/` — use `tests/unit/<module>/` instead
1619
- Put `test_*.py` files in `assets/`, `fixtures/`, or `mock_data/` — those are helpers only
1720
- Duplicate fixtures across multiple `conftest.py` files
21+
- Add a new top-level category under `tests/` without also adding it to the
22+
`.github/workflows/modelkit-ci.yml` path matrix — CI enumerates paths
23+
explicitly, so a new directory is invisible to CI until it's listed
Lines changed: 23 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,9 @@
3434
def _discover_command_names() -> list[str]:
3535
from pathlib import Path
3636

37-
root = Path(__file__).resolve().parent.parent
37+
# Walk up until we find the repo root (marked by pyproject.toml).
38+
# Resilient to this file's depth within tests/.
39+
root = next(p for p in Path(__file__).resolve().parents if (p / "pyproject.toml").exists())
3840
commands_dir = root / "src" / "winml" / "modelkit" / "commands"
3941
return sorted(f.stem for f in commands_dir.glob("*.py") if not f.name.startswith("_"))
4042

@@ -151,7 +153,6 @@ class TestModuleIsolation:
151153
"winml.modelkit.loader",
152154
"winml.modelkit.onnx",
153155
"winml.modelkit.optim",
154-
"winml.modelkit.optracing",
155156
"winml.modelkit.quant",
156157
"winml.modelkit.session",
157158
"winml.modelkit.analyze",
@@ -346,20 +347,25 @@ def test_lazy_imports_all_consistent(self, module: str) -> None:
346347

347348
@pytest.mark.parametrize("module", _LAZY_MODULES)
348349
def test_lazy_imports_all_resolvable(self, module: str) -> None:
349-
"""Every _LAZY_IMPORTS entry must resolve to a real attribute."""
350+
"""Every _LAZY_IMPORTS entry must resolve to a real attribute.
351+
352+
Convention: ``_LAZY_IMPORTS`` maps a lazy attribute name to a
353+
``(submodule_path, real_attr_name)`` tuple, where ``submodule_path``
354+
is relative (e.g. ``".config"``) resolved against the host package.
355+
"""
350356
script = textwrap.dedent(f"""\
351357
import importlib
352358
import {module} as mod
353359
errors = []
354-
for attr_name, submodule_path in mod._LAZY_IMPORTS.items():
360+
for lazy_name, (submodule_path, real_attr) in mod._LAZY_IMPORTS.items():
355361
try:
356-
sub = importlib.import_module(submodule_path)
357-
if not hasattr(sub, attr_name):
362+
sub = importlib.import_module(submodule_path, package={module!r})
363+
if not hasattr(sub, real_attr):
358364
errors.append(
359-
f'{{attr_name}}: {{submodule_path}} has no attribute {{attr_name}}'
365+
f'{{lazy_name}}: {{submodule_path}}.{{real_attr}} not found'
360366
)
361367
except ImportError as exc:
362-
errors.append(f'{{attr_name}}: cannot import {{submodule_path}} ({{exc}})')
368+
errors.append(f'{{lazy_name}}: cannot import {{submodule_path}} ({{exc}})')
363369
if errors:
364370
raise AssertionError(
365371
f'Unresolvable _LAZY_IMPORTS in {module}:\\n' + '\\n'.join(errors)
@@ -393,68 +399,12 @@ def test_command_help_no_heavy_deps(self, cmd: str) -> None:
393399
assert_cli_no_heavy_imports([cmd, "--help"])
394400

395401

396-
# ===========================================================================
397-
# (B) Per-Command Tests — with --model (actual command execution)
398-
# ===========================================================================
399-
400-
_FAKE_ONNX = "nonexistent_test_model.onnx"
401-
_HF_MODEL = "microsoft/resnet-50"
402-
403-
404-
class TestCommandWithModel:
405-
"""Verify import budgets when commands are invoked with --model.
406-
407-
Commands that operate on ONNX files should NOT need torch/transformers.
408-
Commands that operate on HF models legitimately need them.
409-
410-
We use a fake model path so commands fail at file I/O, but the import
411-
chain is already established by that point.
412-
"""
413-
414-
@pytest.mark.parametrize(
415-
("cmd_args", "allowed"),
416-
[
417-
# ONNX-path commands — should NOT need torch/transformers
418-
(
419-
["compile", "--model", _FAKE_ONNX, "-o", "o.onnx", "--ep", "qnn"],
420-
(),
421-
),
422-
(
423-
["quantize", "--model", _FAKE_ONNX, "-o", "o.onnx", "--ep", "qnn"],
424-
(),
425-
),
426-
(
427-
["optimize", "--model", _FAKE_ONNX, "-o", "o.onnx"],
428-
("torch", "torchgen"), # ORT tools.__init__ pulls torch
429-
),
430-
(
431-
["perf", "--model", _FAKE_ONNX],
432-
(),
433-
),
434-
(
435-
["static-analyzer", "check", "--model", _FAKE_ONNX, "--ep", "qnn"],
436-
("torch", "torchgen"), # ORT tools.__init__ pulls torch
437-
),
438-
# HF model commands — legitimately need heavy deps
439-
(
440-
["inspect", "-m", _HF_MODEL],
441-
(*HEAVY_PREFIXES, "torchgen", "torchvision"),
442-
),
443-
(
444-
["config", "-m", _HF_MODEL, "--device", "npu", "--precision", "int8"],
445-
(*HEAVY_PREFIXES, "torchgen", "torchvision"),
446-
),
447-
],
448-
ids=[
449-
"compile-onnx",
450-
"quantize-onnx",
451-
"optimize-onnx",
452-
"perf-onnx",
453-
"static-analyzer-onnx",
454-
"inspect-hf",
455-
"config-hf",
456-
],
457-
)
458-
def test_command_import_budget(self, cmd_args: list[str], allowed: tuple[str, ...]) -> None:
459-
"""Verify each command's import budget with --model."""
460-
assert_cli_no_heavy_imports(cmd_args, allowed=allowed)
402+
# Note: tests that invoke commands with --model belong elsewhere (they
403+
# exercise handler bodies — feature-pipeline territory, not CLI surface).
404+
# The init-time guarantees here cover what's loaded by:
405+
# - importing winml.modelkit.* subpackages (TestModuleIsolation)
406+
# - winml --help and winml <cmd> --help (TestCommandHelp)
407+
# - lazy-import dict structure (TestLazyImportsDict)
408+
# Per-command runtime import budgets (e.g., "winml compile --model X.onnx
409+
# should not pull torch") should be verified in tests/unit/commands/ where
410+
# mocks can isolate the dispatch from the feature pipeline.

0 commit comments

Comments
 (0)