Skip loading external weight data during static analysis.#379
Conversation
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Overall good fix for a real and impactful problem — static analysis never needs multi-GB weight tensors, and the three-site fix is comprehensive. A few items to address before merging.
…to chao/largemodel
Co-authored-by: vortex-captain <75063846+vortex-captain@users.noreply.github.com>
Co-authored-by: vortex-captain <75063846+vortex-captain@users.noreply.github.com>
…ith 'import' and 'import from'' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…ith 'import' and 'import from'' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…to chao/largemodel
…to chao/largemodel
…to chao/largemodel
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good work addressing the previous round of feedback — the is_constant=False fix, test coverage additions, and the raw_data comment all look solid. The three-site fix is correct and comprehensive. A few new items for this round:
Non-inline note — _collect_node_tags / ALL_INPUTS_CONSTANT:
External-data initializers are still in self.initializers, so a node whose inputs are all unloaded external weights would be tagged ALL_INPUTS_CONSTANT even though weight data is unavailable. This is informational-only and doesn't affect runtime check results, but could confuse future debugging. Consider filtering external-data initializers without loaded data in that check.
…to chao/largemodel
Problem
Running
winml analyzeon large models with external data (e.g., Qwen3-8B with a 30.5 GB.datasidecar) causes the process to consume all available memory and disk, hanging indefinitely. The analyzer calledonnx.load(path, load_external_data=True), loading the entire weight file into RAM despite never inspecting weight values.Root Cause
The static analyzer only needs graph structure (operator types, shapes, connectivity, and small embedded constants) to perform op-support checks. Three call sites were loading or attempting to access the full weight tensors unnecessarily:
ONNXStaticAnalyzer.analyze()— explicitly passedload_external_data=TrueONNXLoader.load()— called bareonnx.load()which defaults toload_external_data=TrueRuntimeCheckerQuery— callednumpy_helper.to_array()on every initializer and embedded fullTensorProtos into single-node modelsChanges
load_external_data=True→Falseload_external_data=Falsedimsinstead ofto_array(), and emit graph inputs instead of embedding empty tensors in single-node modelsImpact
data_location != EXTERNALpath is unchanged)Performance (Improve 39.65%)