Skip to content

Refactor analyze runtime rules: migrate to Parquet and streamline checker tooling#411

Merged
fangyangci merged 5 commits into
mainfrom
fangyangci/newRules
May 11, 2026
Merged

Refactor analyze runtime rules: migrate to Parquet and streamline checker tooling#411
fangyangci merged 5 commits into
mainfrom
fangyangci/newRules

Conversation

@fangyangci

@fangyangci fangyangci commented Apr 28, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR migrates the analyze runtime-rule pipeline from ZIP materialization to a Parquet-native flow and improves runtime checker tooling, result processing, and observability.

What changed

  • Migrated runtime rule querying/loading to Parquet-backed paths and caches.
  • Updated rule download/loading scripts and CI pipeline wiring for Parquet artifacts.
  • Improved runtime result processing with per-operator output, deduplication, and conflict CSV generation.
  • Added conflict-file based case_index filtering for targeted reruns.
  • Centralized MODELKIT_TIMING_LOG parsing and structured timing logging into a shared utility.
  • Removed deprecated ZIP expansion tooling and obsolete related tests.

Why

  • Simplifies the runtime-rule toolchain by removing ZIP expansion steps.
  • Improves maintainability and consistency across analyze components.
  • Makes diagnostics and targeted reruns easier in CI and local triage.

Compatibility notes

  • Legacy ZIP expansion utilities were removed.
  • Internal workflows should use the Parquet-based rule path.

Measured impact

  • Rule data footprint reduced from ~20 GB to ~2.2 GB (about 89% smaller).
  • Rule ZIP artifact reduced from ~250 MB to ~25 MB (about 90% smaller).
  • For most model SA runs, end-to-end analyze time improved by around 50%.

Examples:

  • winml analyze -m bert-base-multilingual-cased.onnx: 18s -> 8s (about 56% faster).
  • winml analyze -m resnet-50.onnx: 35s -> 18s (about 49% faster).

These gains come from the Parquet-native rule pipeline and related runtime-checker streamlining in this PR.

related PR: https://github.com/gim-home/ModelKitArtifacts/pull/125

Comment thread src/winml/modelkit/analyze/runtime_checker/result_processor.py Fixed
Comment thread src/winml/modelkit/analyze/runtime_checker/result_processor.py Fixed
@fangyangci fangyangci changed the title Fangyangci/new rules Refactor analyze runtime rules: migrate to Parquet and streamline checker tooling May 6, 2026
@fangyangci fangyangci marked this pull request as ready for review May 6, 2026 04:00
@fangyangci fangyangci requested a review from a team as a code owner May 6, 2026 04:00
@DingmaomaoBJTU

This comment was marked as resolved.

@fangyangci

This comment was marked as resolved.

Comment thread src/winml/modelkit/analyze/core/node_checkers/base.py
Comment thread src/winml/modelkit/analyze/core/runtime_checker_query.py
Comment thread scripts/download_rules.py
Comment thread scripts/download_rules.py
Comment thread src/winml/modelkit/analyze/utils/rule_loader.py
Comment thread src/winml/modelkit/analyze/rules/runtime_check_rules/README.md
Comment thread src/winml/modelkit/analyze/rules/runtime_check_rules/README.md
Comment thread src/winml/modelkit/analyze/rules/runtime_check_rules/README.md
Comment thread .pipelines/modelkit-release-github.yml Outdated
Comment thread src/winml/modelkit/analyze/rules/runtime_check_rules/README.md Outdated
Comment thread src/winml/modelkit/analyze/rules/runtime_check_rules/README.md Outdated

@timenick timenick left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review against project standards. 7 items, mostly minor; #1 and #2 are the most worth addressing before merge.

🤖 Generated with Claude Code

Comment thread src/winml/modelkit/analyze/core/runtime_checker_query.py Outdated
Comment thread src/winml/modelkit/analyze/core/runtime_checker_query.py Outdated
Comment thread src/winml/modelkit/analyze/utils/model_utils.py Outdated
Comment thread src/winml/modelkit/analyze/utils/__init__.py
Comment thread tests/unit/analyze/runtime_checker/test_ep_checker.py
Comment thread src/winml/modelkit/analyze/utils/ep_utils.py
Comment thread src/winml/modelkit/analyze/core/runtime_checker.py Outdated
@fangyangci fangyangci force-pushed the fangyangci/newRules branch from e2c54ec to a868d43 Compare May 9, 2026 03:25
@fangyangci fangyangci force-pushed the fangyangci/newRules branch from a868d43 to 750cd19 Compare May 9, 2026 03:25
@fangyangci fangyangci merged commit 1325a7b into main May 11, 2026
9 checks passed
@fangyangci fangyangci deleted the fangyangci/newRules branch May 11, 2026 03:40
ssss141414 pushed a commit that referenced this pull request May 15, 2026
…cker tooling (#411)

## Summary
This PR migrates the analyze runtime-rule pipeline from ZIP
materialization to a Parquet-native flow and improves runtime checker
tooling, result processing, and observability.

## What changed
- Migrated runtime rule querying/loading to Parquet-backed paths and
caches.
- Updated rule download/loading scripts and CI pipeline wiring for
Parquet artifacts.
- Improved runtime result processing with per-operator output,
deduplication, and conflict CSV generation.
- Added conflict-file based case_index filtering for targeted reruns.
- Centralized MODELKIT_TIMING_LOG parsing and structured timing logging
into a shared utility.
- Removed deprecated ZIP expansion tooling and obsolete related tests.

## Why
- Simplifies the runtime-rule toolchain by removing ZIP expansion steps.
- Improves maintainability and consistency across analyze components.
- Makes diagnostics and targeted reruns easier in CI and local triage.

## Compatibility notes
- Legacy ZIP expansion utilities were removed.
- Internal workflows should use the Parquet-based rule path.

## Measured impact
- Rule data footprint reduced from ~20 GB to ~2.2 GB (about 89%
smaller).
- Rule ZIP artifact reduced from ~250 MB to ~25 MB (about 90% smaller).
- For most model SA runs, end-to-end analyze time improved by around
50%.

Examples:
- winml analyze -m bert-base-multilingual-cased.onnx: 18s -> 8s (about
56% faster).
- winml analyze -m resnet-50.onnx: 35s -> 18s (about 49% faster).

These gains come from the Parquet-native rule pipeline and related
runtime-checker streamlining in this PR.

related PR:  https://github.com/gim-home/ModelKitArtifacts/pull/125
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants