feat(telemetry): Phase 2 — core telemetry logic (consent, scrub, singleton)#371
Conversation
…leton) Adds the ModelKit-specific telemetry layer on top of Phase 1's library/ and deviceid/: - constants.py: empty INSTRUMENTATION_KEY placeholder (official builds inject) - utils.py: exception scrubbing (path trim / 200-char cap / PII regex), structured traceback extraction, base64 cache encoding, msvcrt file lock - consent.py: accept-by-default first-run prompt, JSON config at %USERPROFILE%\.modelkit\config.json, CI auto-detection (fail-closed) - telemetry.py: lazy singleton wiring LoggerProvider + Resource, log_heartbeat / log_action / log_error with per-event allowlist, defensive try/except on every public method, idempotent shutdown Still not wired to the CLI — ActionGroup lands in Phase 3.
- consent.py: add explanatory comment to empty except OSError in _write_stored_consent temp-file cleanup path - test_utils_stack.py: add unreachable explicit return None in _raise_chain to avoid mixed implicit/explicit return warning
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Thorough Phase 2 implementation. Clean separation of concerns (consent, scrubbing, singleton), defensive error handling throughout, and solid test coverage (66 tests). A few items worth discussing below.
- utils.py: require a digit in the long-token regex so Pythonic class
names (e.g. WinMLImageFeatureExtractionEvaluator) are not scrubbed
- utils.py: validate IPv4 octets to 0-255; document the remaining
4-part-version false positive as an accepted trade-off
- consent.py: fall back to expanduser("~") if USERPROFILE is unset so
the config never silently resolves to a CWD-relative path
- telemetry.py: comment that invoked_from="Script" is forward-compat
under the current consent rules (non-TTY -> disabled)
- telemetry.py: note in Telemetry docstring that callers own shutdown
lifecycle; Phase 3 ActionGroup will own it in its teardown path
- tests: regression cases for class-name preservation, digit-gated
long token, and out-of-range IPv4
…vars
expanduser("~") is stdlib cross-platform — it checks POSIX HOME on
non-Windows. ModelKit is Windows-only, so use USERPROFILE with a fallback
to HOMEDRIVE+HOMEPATH (both native Windows vars) instead.
|
Thanks for the review — all 5 points addressed across commits ff41e35 + 7288707: #3 (long-token regex): Added #4 (IPv4 vs version strings): Added octet range validation (0-255) — obvious nonsense like #5 (USERPROFILE missing): Fall back to #6 ( #7 (no |
Introduces `consent_version` in `%USERPROFILE%\.modelkit\config.json` so a materially changed notice text can trigger a re-prompt on the next invocation. A stored version strictly older than `_CONSENT_VERSION` is treated as unrecorded on read; records without the field are grandfathered as the current version so existing users are not re-prompted by the introduction of the field itself. Addresses xieofxie's review on PR #371.
The two constants are a pair — version bumps when prompt text's scope changes — so keep them adjacent and document the coupling.
- Remove Section 7 'Current Violations': all 9 classes already renamed. - Acronym table: fix HTP example (real classes), drop HF row (not a class prefix). - Add canonical EP table covering CPU, CUDA, DML, MIGraphX, NvTensorRTRTX, OpenVINO, QNN, VitisAI with ORT full names and casing notes. - Move HF_ constant prefix to a separate 'Other Canonical Identifiers' section.
3b12398 to
08ba3af
Compare
## Summary PR 4 of 4 of the ModelKit telemetry rollout. User-facing documentation only: - **`docs/Privacy.md`** (new): canonical privacy statement — what the three event types (`ModelKitHeartbeat` / `ModelKitAction` / `ModelKitError`) carry, common context attributes, the explicit NOT-collected list, the consent model (first-run prompt + `%USERPROFILE%\.winml\config.json`), CI auto-disable env vars, the failure cache file location, and the dev-install no-op invariant. - **`README.md`**: short Data / Telemetry section summarizing the consent model and linking to `docs/Privacy.md`. Paths in both docs reflect Phase 3 reality (`.winml`, not `.modelkit`) and the persistent failure cache wired in PR #399. The dev-only `MODELKIT_TELEMETRY_CACHE_DIR` override is intentionally omitted from the end-user docs. ## Plan reference - [Phase 4 plan](docs/superpowers/plans/2026-04-17-modelkit-telemetry-phase4.md) - Phase 1 #367 / Phase 2 #371 / Phase 3 #399 — all merged
Summary
Phase 2 of the telemetry rollout. Builds on Phase 1 (
library/+deviceid/, merged in #367) with the ModelKit-specific telemetry layer.constants.py— emptyINSTRUMENTATION_KEYplaceholder; official builds inject the real iKey before packaging. No runtime / env-var override.utils.py— exception scrubbing pipeline (path trim → 200-char cap → PII regex for email / GUID / IPv4 / IPv6 / long-token), structured traceback extraction (only{file, line, function}), base64 cache encoding,msvcrtexclusive file lock.consent.py— first-run prompt (accept-by-default[Y/n]), JSON config at%USERPROFILE%\.modelkit\config.jsonwith atomic writes that preserve unrelated keys, CI auto-detection viaCI/TF_BUILD/GITHUB_ACTIONS/ etc. (fail-closed in non-interactive environments).telemetry.py— lazy singleton that wiresLoggerProvider+Resource, exposeslog_heartbeat/log_action/log_errorwith per-event allowlist enforcement,self._logger is Noneguards, defensivetry/excepton every public method, idempotentshutdown.Still not wired to the CLI —
ActionGroupauto-instrumentation lands in Phase 3.Test results
uv run pytest tests/unit/telemetry/→ 105 PASSED (39 from Phase 1 + 66 new).uv run pytest tests/→ no regressions.uv run python -c "import winml.modelkit.telemetry"→ no side effects (iKey empty →Telemetry.disabledstaysTrue).uvx ruff check src/winml/modelkit/telemetry/ tests/unit/telemetry/→ clean.Demo — captured wire format
A local-only preview harness (
winml.modelkit.telemetry._demo, not shipped) monkey-patches the iKey, points consent at a temp config, and interceptsrequests.Session.postso the exporter serializes CS 4.0 envelopes to stdout instead of the network. Output below (device ID + session ID redacted):First-run consent prompt text
ModelKitHeartbeat envelope
{ "ver": "4.0", "name": "ModelKitHeartbeat", "time": "2026-04-21T07:12:41.603Z", "iKey": "o:fake-demo-tenant-token", "data": {}, "ext": { "device": { "localId": "<redacted-sha256>", "authId": "EXISTING", "deviceClass": "AMD64" }, "os": { "name": "Windows", "ver": "10.0.26310", "release": "10" }, "app": { "ver": "0.0.1", "sesId": "<redacted-uuid>", "initTs": 1776755561.4647715 } } }ModelKitAction envelope — happy path (build on NPU)
Extra kwargs outside the allowlist (e.g.
leaked_field) are dropped before emission.{ "ver": "4.0", "name": "ModelKitAction", "time": "2026-04-21T07:12:41.603Z", "iKey": "o:fake-demo-tenant-token", "data": { "invoked_from": "Interactive", "action_name": "build", "device": "NPU", "ep": "QNNExecutionProvider", "duration_ms": 4321, "success": true }, "ext": { "...": "same as heartbeat" } }ModelKitAction envelope — failure path
{ "ver": "4.0", "name": "ModelKitAction", "time": "2026-04-21T07:12:41.603Z", "iKey": "o:fake-demo-tenant-token", "data": { "invoked_from": "Interactive", "action_name": "quantize", "device": null, "ep": null, "duration_ms": 87, "success": false }, "ext": { "...": "same as heartbeat" } }ModelKitError envelope — scrubbed message + structured stack
Original exception was
FileNotFoundError(r"model file not found: C:\Users\Alice\models\mobilenet.onnx"). Path is outside the package root, so_trim_pathfalls back to basename. Real ModelKit errors typically only hit the path rule; the defensive regex rules are shown separately below.{ "ver": "4.0", "name": "ModelKitError", "time": "2026-04-21T07:12:41.604Z", "iKey": "o:fake-demo-tenant-token", "data": { "exception_type": "FileNotFoundError", "exception_message": "model file not found: mobilenet.onnx", "exception_stack": [ { "file": "winml/modelkit/telemetry/_demo.py", "line": 128, "function": "_run_demo" } ] }, "ext": { "...": "same as heartbeat" } }PII scrubbing pipeline — rule-by-rule examples
Pipeline order: per-token path trim → 200-char length cap → PII regex sweep. Each rule in isolation:
C:\Users\Alice\src\winml\modelkit\commands\build.pywinml/modelkit/commands/build.pyC:\Users\Alice\models\mobilenet.onnxmobilenet.onnxfailed to reach alice@example.comfailed to reach <scrubbed>job 12345678-1234-5678-1234-567812345678 failedjob <scrubbed> failedconnection to 10.0.0.1 refusedconnection to <scrubbed> refusedexpected version 1.18.999.1expected version 1.18.999.1bind 2001:db8::1 failedbind <scrubbed> failedBearer example_token_1234567890abcdefghijBearer <scrubbed>WinMLImageFeatureExtractionEvaluatorWinMLImageFeatureExtractionEvaluatorThe path rule is what typical ModelKit errors hit. Email / IPv4 / IPv6 / GUID / long-token rules are defensive — they catch PII that might leak through from lower-level libraries (stdlib URL errors, Windows API GUIDs, upstream HTTP clients), not patterns ModelKit itself produces.