perf: optimize winml sys startup (55s → 4s on Qualcomm) by timenick · Pull Request #266 · microsoft/winml-cli

timenick · 2026-04-08T03:25:56Z

Summary

Resolves #261 — winml sys took ~55s on Snapdragon X Plus with no progress indicator, appearing hung.

Root causes identified and fixed:

Eager module loading: winml.modelkit.__init__.py and cli.py eagerly imported torch/transformers/optimum (~6s) even for lightweight commands like sys
Per-device PowerShell processes: PnpDevice.__init__ spawned a separate PowerShell for each NPU's Get-PnpDeviceProperty — extremely slow on Qualcomm ACPI devices
Serial PowerShell invocations: CPU, GPU, NPU, and OS queries each launched independent PowerShell processes (~2s cold start each)
No parallelism: Hardware probing and Python-side work (torch import, library scanning) ran sequentially

Optimizations applied:

Optimization	Impact
Lazy `__init__.py` for `winml.modelkit` and `session` packages	Skip torch/transformers/optimum import for `sys` command
Lazy CLI command discovery (`_LazyGroup`)	Only import the invoked subcommand module
`platform.version()` for Windows 11 detection	Eliminate `OS.get()` PowerShell call entirely
`query_all_hardware()` — single PowerShell process	Merge CIM + PnP + PnpDeviceProperty into one invocation
`ThreadPoolExecutor` parallelism	Overlap PowerShell probe with `import torch` / library scanning
Batched `Get-PnpDeviceProperty -KeyName`	Query only needed properties instead of all

Benchmarks

Command	Device	main	This PR	Speedup
`winml sys`	Qualcomm ARM64	55s	4.2s	13.1x
`winml sys --list-device`	Qualcomm ARM64	54s	4.3s	12.6x
`winml sys`	Intel x64	11.0s	2.7s	4.1x
`winml sys --list-device`	Intel x64	10.7s	2.5s	4.3x

Changed files

__init__.py (winml.modelkit) — __getattr__ lazy loading with cached resolution
cli.py — _LazyGroup for on-demand command module import
commands/sys.py — _is_windows_11(), query_all_hardware(), parallel execution
session/__init__.py — __getattr__ lazy loading to break circular import chain
sysinfo/helper.py — CimInstance.get_many_by_class_name(), PnpDevice batched properties, query_all_hardware()
sysinfo/hardware.py — NPU._EXTRA_PROPERTY_KEYS for targeted property fetch
onnx/detection.py — Move QDQ_OP_TYPES import to function level (break circular import)
tests/unit/sysinfo/test_sysinfo.py — Updated for _is_windows_11(), added edge case coverage

- Lazy-load winml.modelkit and session __init__.py to skip torch/ transformers/optimum import for lightweight commands - Lazy CLI command discovery — only import the invoked subcommand - Replace OS.get() WMI call with platform.version() for Win11 detection - Merge all hardware queries (CIM + PnP + properties) into a single PowerShell process via query_all_hardware() - Parallelize PowerShell hardware probe with Python-side work (torch import, library version scanning) using ThreadPoolExecutor - Move QDQ_OP_TYPES import to function level to break onnx ↔ compiler circular import exposed by lazy loading - Cache __getattr__ results via globals() to avoid repeated resolution Benchmarks: Intel x64: 11s → 2.7s Qualcomm ARM64: 64s → 8.2s

…l-sys-perf

…-module

…self

timenick added 2 commits April 8, 2026 11:23

Merge remote-tracking branch 'origin/main' into zhiwang/optimize-winm…

a488598

…l-sys-perf

github-advanced-security AI found potential problems Apr 8, 2026

View reviewed changes

Comment thread src/winml/modelkit/commands/sys.py Fixed

Comment thread src/winml/modelkit/__init__.py Fixed

timenick added 2 commits April 8, 2026 11:32

fix: rename import sys to _stdlib_sys to resolve CodeQL py/import-own…

894f824

…-module

fix: resolve CodeQL alerts — uninitialized locals and unused global

7f1f07d

github-advanced-security AI found potential problems Apr 8, 2026

View reviewed changes

Comment thread src/winml/modelkit/commands/sys.py Fixed

timenick added 2 commits April 8, 2026 11:41

fix: move import sys to local scope to avoid CodeQL module-imports-it…

a7e2653

…self

Merge branch 'main' into zhiwang/optimize-winml-sys-perf

1dc1072

github-advanced-security AI found potential problems Apr 8, 2026

View reviewed changes

Comment thread src/winml/modelkit/commands/sys.py Fixed

timenick added 3 commits April 8, 2026 11:50

fix: use __import__ to avoid CodeQL module-imports-itself on sys.py

891114b

fix: use importlib.import_module for stdlib sys per CodeQL suggestion

35448b8

Merge branch 'main' into zhiwang/optimize-winml-sys-perf

1b0fb45

timenick marked this pull request as ready for review April 8, 2026 08:39

timenick requested a review from a team as a code owner April 8, 2026 08:39

timenick requested a review from tezheng April 8, 2026 08:59

Merge branch 'main' into zhiwang/optimize-winml-sys-perf

9593d1b

DingmaomaoBJTU closed this Apr 9, 2026

timenick deleted the zhiwang/optimize-winml-sys-perf branch April 13, 2026 06:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: optimize winml sys startup (55s → 4s on Qualcomm)#266

perf: optimize winml sys startup (55s → 4s on Qualcomm)#266
timenick wants to merge 10 commits into
mainfrom
zhiwang/optimize-winml-sys-perf

timenick commented Apr 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

timenick commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmarks

Changed files

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

timenick commented Apr 8, 2026 •

edited

Loading