Refactor analyze runtime rules: migrate to Parquet and streamline checker tooling#411
Merged
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
xieofxie
reviewed
May 7, 2026
xieofxie
reviewed
May 7, 2026
timenick
reviewed
May 7, 2026
timenick
reviewed
May 7, 2026
timenick
reviewed
May 7, 2026
timenick
reviewed
May 8, 2026
timenick
reviewed
May 8, 2026
timenick
reviewed
May 8, 2026
timenick
reviewed
May 8, 2026
e2c54ec to
a868d43
Compare
a868d43 to
750cd19
Compare
timenick
approved these changes
May 11, 2026
ssss141414
pushed a commit
that referenced
this pull request
May 15, 2026
…cker tooling (#411) ## Summary This PR migrates the analyze runtime-rule pipeline from ZIP materialization to a Parquet-native flow and improves runtime checker tooling, result processing, and observability. ## What changed - Migrated runtime rule querying/loading to Parquet-backed paths and caches. - Updated rule download/loading scripts and CI pipeline wiring for Parquet artifacts. - Improved runtime result processing with per-operator output, deduplication, and conflict CSV generation. - Added conflict-file based case_index filtering for targeted reruns. - Centralized MODELKIT_TIMING_LOG parsing and structured timing logging into a shared utility. - Removed deprecated ZIP expansion tooling and obsolete related tests. ## Why - Simplifies the runtime-rule toolchain by removing ZIP expansion steps. - Improves maintainability and consistency across analyze components. - Makes diagnostics and targeted reruns easier in CI and local triage. ## Compatibility notes - Legacy ZIP expansion utilities were removed. - Internal workflows should use the Parquet-based rule path. ## Measured impact - Rule data footprint reduced from ~20 GB to ~2.2 GB (about 89% smaller). - Rule ZIP artifact reduced from ~250 MB to ~25 MB (about 90% smaller). - For most model SA runs, end-to-end analyze time improved by around 50%. Examples: - winml analyze -m bert-base-multilingual-cased.onnx: 18s -> 8s (about 56% faster). - winml analyze -m resnet-50.onnx: 35s -> 18s (about 49% faster). These gains come from the Parquet-native rule pipeline and related runtime-checker streamlining in this PR. related PR: https://github.com/gim-home/ModelKitArtifacts/pull/125
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR migrates the analyze runtime-rule pipeline from ZIP materialization to a Parquet-native flow and improves runtime checker tooling, result processing, and observability.
What changed
Why
Compatibility notes
Measured impact
Examples:
These gains come from the Parquet-native rule pipeline and related runtime-checker streamlining in this PR.
related PR: https://github.com/gim-home/ModelKitArtifacts/pull/125