feat(telemetry): Phase 1 — OneCollector library + device ID foundation by timenick · Pull Request #367 · microsoft/winml-cli

timenick · 2026-04-17T06:12:08Z

Summary

PR 1 of 4 in the ModelKit telemetry rollout. Introduces the transport foundation:

telemetry/library/ — a custom opentelemetry-sdk log exporter that speaks Common Schema 4.0 to the OneCollector endpoint.
telemetry/deviceid/ — a SHA256-hashed UUID persisted per machine, used as the telemetry device_id.

Emission-free by design: no Telemetry singleton, no consent prompt, no CLI wiring yet. User-facing behavior is unchanged.

What's next

PR 2 (blocked on Privacy sign-off of prompt text + event schemas): constants.py + utils.py (exception / PII scrubbing) + consent.py (first-run prompt) + telemetry.py (singleton).
PR 3: ActionGroup auto-instrumentation + cli.py integration + --disable-telemetry flag.
PR 4: docs/Privacy.md + README telemetry section.

…de test

…eader, comments)

…e (store)

…atus

…ey validation, fixture dedup)

…odeQL)

DingmaomaoBJTU

Code review

Found 4 issues:

Test import in test_deviceid.py reaches into internal submodule for publicly exported symbol (CLAUDE.md violation)
OneCollectorLogExporter missing from __all__ in library/__init__.py (CLAUDE.md violation)
Test import in test_exporter.py reaches into internal submodule for publicly exported symbol (CLAUDE.md violation)
_MIN_HTTP_TIMEOUT is dead code — max(0.1, 10.0) always evaluates to 10.0

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

- Export OneCollectorLogExporter in library.__all__ - Switch test_deviceid and test_exporter to package-level imports - Drop dead _MIN_HTTP_TIMEOUT clamp; use _HTTP_TIMEOUT directly

DingmaomaoBJTU

Previous 4 issues all addressed. Found 1 new issue on second pass.

Adds a make_provider fixture that tracks created LoggerProviders and calls shutdown() during teardown. Without this the BatchLogRecordProcessor leaks a background daemon thread per test invocation.

…hiwang/telemetry-phase1

… helper IdStatus subclasses str so callers can plug status straight into an OpenTelemetry Resource (str-typed attributes) and the CS 4.0 ext.device.authId slot without a .value conversion. Dropped get_telemetry_base_dir(): it returned %LOCALAPPDATA%\..\ but the design places cache/consent under %USERPROFILE%\.modelkit(owned by Phase 2). The helper had no production callers; Phase 2 will expose the correct path from utils.py/consent.py.

…leton) (#371) ## Summary Phase 2 of the [telemetry rollout](../blob/main/docs/superpowers/specs/2026-04-17-modelkit-telemetry-design.md). Builds on Phase 1 (`library/` + `deviceid/`, merged in #367) with the ModelKit-specific telemetry layer. - `constants.py` — empty `INSTRUMENTATION_KEY` placeholder; official builds inject the real iKey before packaging. No runtime / env-var override. - `utils.py` — exception scrubbing pipeline (path trim → 200-char cap → PII regex for email / GUID / IPv4 / IPv6 / long-token), structured traceback extraction (only `{file, line, function}`), base64 cache encoding, `msvcrt` exclusive file lock. - `consent.py` — first-run prompt (accept-by-default `[Y/n]`), JSON config at `%USERPROFILE%\.modelkit\config.json` with atomic writes that preserve unrelated keys, CI auto-detection via `CI` / `TF_BUILD` / `GITHUB_ACTIONS` / etc. (fail-closed in non-interactive environments). - `telemetry.py` — lazy singleton that wires `LoggerProvider` + `Resource`, exposes `log_heartbeat` / `log_action` / `log_error` with per-event allowlist enforcement, `self._logger is None` guards, defensive `try/except` on every public method, idempotent `shutdown`. Still **not wired to the CLI** — `ActionGroup` auto-instrumentation lands in Phase 3. ## Test results - `uv run pytest tests/unit/telemetry/` → **105 PASSED** (39 from Phase 1 + 66 new). - `uv run pytest tests/` → no regressions. - `uv run python -c "import winml.modelkit.telemetry"` → no side effects (iKey empty → `Telemetry.disabled` stays `True`). - `uvx ruff check src/winml/modelkit/telemetry/ tests/unit/telemetry/` → clean. ## Demo — captured wire format A local-only preview harness (`winml.modelkit.telemetry._demo`, not shipped) monkey-patches the iKey, points consent at a temp config, and intercepts `requests.Session.post` so the exporter serializes CS 4.0 envelopes to stdout instead of the network. Output below (device ID + session ID redacted): <details> <summary>First-run consent prompt text</summary> ``` ModelKit can collect anonymous usage data to help improve the product. What is collected: - Command name, duration, success/failure - Target device/EP (when the command specifies them) - OS, architecture, ModelKit version - Unhandled exception types, code locations, and scrubbed error messages (paths trimmed, length capped, PII patterns scrubbed) What is never collected: - File paths, model contents, command arguments, credentials Enable telemetry? [Y/n]: ``` </details> <details> <summary>ModelKitHeartbeat envelope</summary> ```json { "ver": "4.0", "name": "ModelKitHeartbeat", "time": "2026-04-21T07:12:41.603Z", "iKey": "o:fake-demo-tenant-token", "data": {}, "ext": { "device": { "localId": "<redacted-sha256>", "authId": "EXISTING", "deviceClass": "AMD64" }, "os": { "name": "Windows", "ver": "10.0.26310", "release": "10" }, "app": { "ver": "0.0.1", "sesId": "<redacted-uuid>", "initTs": 1776755561.4647715 } } } ``` </details> <details> <summary>ModelKitAction envelope — happy path (build on NPU)</summary> Extra kwargs outside the allowlist (e.g. `leaked_field`) are dropped before emission. ```json { "ver": "4.0", "name": "ModelKitAction", "time": "2026-04-21T07:12:41.603Z", "iKey": "o:fake-demo-tenant-token", "data": { "invoked_from": "Interactive", "action_name": "build", "device": "NPU", "ep": "QNNExecutionProvider", "duration_ms": 4321, "success": true }, "ext": { "...": "same as heartbeat" } } ``` </details> <details> <summary>ModelKitAction envelope — failure path</summary> ```json { "ver": "4.0", "name": "ModelKitAction", "time": "2026-04-21T07:12:41.603Z", "iKey": "o:fake-demo-tenant-token", "data": { "invoked_from": "Interactive", "action_name": "quantize", "device": null, "ep": null, "duration_ms": 87, "success": false }, "ext": { "...": "same as heartbeat" } } ``` </details> <details> <summary>ModelKitError envelope — scrubbed message + structured stack</summary> Original exception was `FileNotFoundError(r"model file not found: C:\Users\Alice\models\mobilenet.onnx")`. Path is outside the package root, so `_trim_path` falls back to basename. Real ModelKit errors typically only hit the path rule; the defensive regex rules are shown separately below. ```json { "ver": "4.0", "name": "ModelKitError", "time": "2026-04-21T07:12:41.604Z", "iKey": "o:fake-demo-tenant-token", "data": { "exception_type": "FileNotFoundError", "exception_message": "model file not found: mobilenet.onnx", "exception_stack": [ { "file": "winml/modelkit/telemetry/_demo.py", "line": 128, "function": "_run_demo" } ] }, "ext": { "...": "same as heartbeat" } } ``` </details> <details> <summary>PII scrubbing pipeline — rule-by-rule examples</summary> Pipeline order: per-token path trim → 200-char length cap → PII regex sweep. Each rule in isolation: | Rule | Input | Output | |---|---|---| | Path (inside package) | `C:\Users\Alice\src\winml\modelkit\commands\build.py` | `winml/modelkit/commands/build.py` | | Path (outside package) | `C:\Users\Alice\models\mobilenet.onnx` | `mobilenet.onnx` | | Email | `failed to reach alice@example.com` | `failed to reach <scrubbed>` | | GUID | `job 12345678-1234-5678-1234-567812345678 failed` | `job <scrubbed> failed` | | IPv4 | `connection to 10.0.0.1 refused` | `connection to <scrubbed> refused` | | IPv4 (invalid octet — preserved) | `expected version 1.18.999.1` | `expected version 1.18.999.1` | | IPv6 | `bind 2001:db8::1 failed` | `bind <scrubbed> failed` | | Long token (has digit) | `Bearer example_token_1234567890abcdefghij` | `Bearer <scrubbed>` | | Long token (no digit — preserved) | `WinMLImageFeatureExtractionEvaluator` | `WinMLImageFeatureExtractionEvaluator` | The path rule is what typical ModelKit errors hit. Email / IPv4 / IPv6 / GUID / long-token rules are defensive — they catch PII that might leak through from lower-level libraries (stdlib URL errors, Windows API GUIDs, upstream HTTP clients), not patterns ModelKit itself produces. </details>

## Summary PR 4 of 4 of the ModelKit telemetry rollout. User-facing documentation only: - **`docs/Privacy.md`** (new): canonical privacy statement — what the three event types (`ModelKitHeartbeat` / `ModelKitAction` / `ModelKitError`) carry, common context attributes, the explicit NOT-collected list, the consent model (first-run prompt + `%USERPROFILE%\.winml\config.json`), CI auto-disable env vars, the failure cache file location, and the dev-install no-op invariant. - **`README.md`**: short Data / Telemetry section summarizing the consent model and linking to `docs/Privacy.md`. Paths in both docs reflect Phase 3 reality (`.winml`, not `.modelkit`) and the persistent failure cache wired in PR #399. The dev-only `MODELKIT_TELEMETRY_CACHE_DIR` override is intentionally omitted from the end-user docs. ## Plan reference - [Phase 4 plan](docs/superpowers/plans/2026-04-17-modelkit-telemetry-phase4.md) - Phase 1 #367 / Phase 2 #371 / Phase 3 #399 — all merged

…closed The fixture used ``return`` instead of ``yield`` + teardown, leaking the underlying ``requests.Session`` (and its connection pool) across tests. Same pattern flagged in #367 and fixed there for the factory fixture; applying it here too.

timenick added 12 commits April 17, 2026 13:12

chore: add opentelemetry-sdk dependency for telemetry phase 1

817d36f

feat(telemetry): add Common Schema 4.0 envelope serializer

5a802bc

refactor(telemetry): mark envelope helpers as private + restore unico…

350b29f

…de test

feat(telemetry): add OneCollectorLogExporter with CS 4.0 serialization

bb5bd62

refactor(telemetry): address exporter review findings (tests, dedup h…

209ec3d

…eader, comments)

feat(telemetry): expose create_logger_provider factory from library/

8e13888

feat(telemetry): add cross-platform _store with registry + HOME fallback

89e5510

refactor(telemetry): narrow registry error swallow + atomic file writ…

71402b0

…e (store)

feat(telemetry): add device ID generation with EXISTING/NEW/FAILED st…

7c05727

…atus

feat(telemetry): expose get_or_create_device_id from deviceid/

4175c4a

Merge remote-tracking branch 'origin/main' into zhiwang/telemetry-phase1

87acfb9

refactor(telemetry): address final review findings (public import, ik…

a5b83a6

…ey validation, fixture dedup)

timenick requested a review from a team as a code owner April 17, 2026 06:12

github-advanced-security AI found potential problems Apr 17, 2026

View reviewed changes

Comment thread src/winml/modelkit/telemetry/deviceid/_store.py Fixed

Comment thread tests/unit/telemetry/deviceid/conftest.py Fixed

timenick added 2 commits April 17, 2026 14:20

chore(telemetry): add explanatory comments to empty except clauses (C…

ee37386

…odeQL)

refactor(telemetry): drop non-Windows parity code from deviceid store

c88999c

DingmaomaoBJTU reviewed Apr 17, 2026

View reviewed changes

Comment thread tests/unit/telemetry/deviceid/test_deviceid.py Outdated

Comment thread src/winml/modelkit/telemetry/library/__init__.py Outdated

Comment thread tests/unit/telemetry/library/test_exporter.py Outdated

Comment thread src/winml/modelkit/telemetry/library/exporter.py Outdated

timenick added 3 commits April 17, 2026 17:13

refactor(telemetry): address PR 367 review feedback

b8c884e

- Export OneCollectorLogExporter in library.__all__ - Switch test_deviceid and test_exporter to package-level imports - Drop dead _MIN_HTTP_TIMEOUT clamp; use _HTTP_TIMEOUT directly

Merge branch 'main' into zhiwang/telemetry-phase1

db86cc7

Merge branch 'main' into zhiwang/telemetry-phase1

24dbadc

DingmaomaoBJTU approved these changes Apr 21, 2026

View reviewed changes

DingmaomaoBJTU reviewed Apr 21, 2026

View reviewed changes

Comment thread tests/unit/telemetry/library/test_factory.py Outdated

timenick added 3 commits April 21, 2026 10:17

test(telemetry): shut down LoggerProvider in factory tests

cba3b9f

Adds a make_provider fixture that tracks created LoggerProviders and calls shutdown() during teardown. Without this the BatchLogRecordProcessor leaks a background daemon thread per test invocation.

Merge remote-tracking branch 'origin/main' into zhiwang/telemetry-phase1

f954e00

Merge remote-tracking branch 'origin/zhiwang/telemetry-phase1' into z…

a4d7de2

…hiwang/telemetry-phase1

timenick requested a review from xieofxie April 21, 2026 02:21

xieofxie reviewed Apr 21, 2026

View reviewed changes

Comment thread src/winml/modelkit/telemetry/deviceid/deviceid.py Outdated

xieofxie reviewed Apr 21, 2026

View reviewed changes

Comment thread src/winml/modelkit/telemetry/deviceid/_store.py

timenick added 2 commits April 21, 2026 10:52

Merge remote-tracking branch 'origin/main' into zhiwang/telemetry-phase1

8b426ad

xieofxie approved these changes Apr 21, 2026

View reviewed changes

timenick merged commit 3098e4c into main Apr 21, 2026
9 checks passed

timenick deleted the zhiwang/telemetry-phase1 branch April 21, 2026 03:16

timenick mentioned this pull request Apr 21, 2026

feat(telemetry): Phase 2 — core telemetry logic (consent, scrub, singleton) #371

Merged

timenick mentioned this pull request Apr 29, 2026

docs(telemetry): Phase 4 — Privacy statement + README section #417

Merged

DingmaomaoBJTU mentioned this pull request May 7, 2026

fix(telemetry): OneCollector iKey wire format and 4xx handling #456

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(telemetry): Phase 1 — OneCollector library + device ID foundation#367

feat(telemetry): Phase 1 — OneCollector library + device ID foundation#367
timenick merged 22 commits into
mainfrom
zhiwang/telemetry-phase1

timenick commented Apr 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

DingmaomaoBJTU left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DingmaomaoBJTU left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

timenick commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's next

Uh oh!

Uh oh!

Uh oh!

DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

Code review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

timenick commented Apr 17, 2026 •

edited

Loading