Add multi-hardware kernel-aware NAS artifact pack#123
Conversation
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughAdds deterministic multi-hardware NAS reporting (A100/T4/H100) with deterministic candidate generation, quality-proxy constraints, optional calibration-driven profile overrides, extended build_report constrained rankings, markdown/JSON artifact packing, docs, and tests. ChangesKernel-aware NAS multi-hardware reporting with quality constraints
Sequence Diagram(s) sequenceDiagram
participant CLI as nas_experiment CLI
participant Cal as build_calibration_metadata
participant Pack as build_multi_hardware_report
participant Model as ArchitectureCostModel
participant Report as build_report
participant Writer as artifact writer
CLI->>Cal: load calibration DB (optional path)
CLI->>Pack: request multi-hardware pack (calibration)
Pack->>Model: construct per-hardware model (profile_overrides)
Pack->>Report: call build_report(hardware_model, candidate_catalog, calibration)
Report->>Pack: return per-hardware report (comparisons, quality_constrained_latency)
Pack->>Writer: write JSON artifact
CLI->>Writer: render and write markdown via render_multi_hardware_markdown(pack)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/nas_experiment.py`:
- Around line 355-360: The code in build_report unconditionally accesses
fastest_by_name["deep_narrow"]["rank"], which can KeyError if candidate_configs
omit "deep_narrow"; change the lookup to guard against missing key (e.g., use
fastest_by_name.get("deep_narrow") and extract "rank" only if present, otherwise
set deep_narrow_unconstrained_rank to None or a sensible default). Update the
return dict entry "deep_narrow_unconstrained_rank" to use this guarded value and
ensure any downstream assumptions handle the None/default case.
In `@src/research_engine/arch_cost_model.py`:
- Line 125: The assignment merging overrides into self.hw currently accepts any
keys/values; before performing self.hw = {**HARDWARE_PROFILES[hardware],
**(profile_overrides or {})} validate profile_overrides: ensure every key exists
in HARDWARE_PROFILES[hardware] (reject/raise on unknown keys to catch typos) and
ensure every override value is a finite positive number (reject/raise on zero,
negative, NaN, or infinity); perform these checks inside the ArchCostModel
constructor (or the method where self.hw is set) and either filter invalid
entries or raise a ValueError with a clear message referencing profile_overrides
and the invalid key(s).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 1c888459-970e-412b-bf80-49a40e369fb5
📒 Files selected for processing (7)
docs/research/kernel-aware-nas.mddocs/results/README.mddocs/results/kernel-aware-nas-multihardware.jsondocs/results/kernel-aware-nas-multihardware.mdscripts/nas_experiment.pysrc/research_engine/arch_cost_model.pytests/test_arch_cost_model.py
| if hardware not in HARDWARE_PROFILES: | ||
| raise ValueError(f"Unknown hardware {hardware!r}, choose from {list(HARDWARE_PROFILES)}") | ||
| self.hw = HARDWARE_PROFILES[hardware] | ||
| self.hw = {**HARDWARE_PROFILES[hardware], **(profile_overrides or {})} |
There was a problem hiding this comment.
Validate override keys and values before merging calibration data.
Line 125 accepts arbitrary override keys and numeric values without checks. A typoed key is silently ignored, and zero/negative/non-finite values can later cause invalid predictions or division-by-zero in latency calculations.
Proposed fix
@@
def __init__(
self,
hardware: str = "a100",
*,
profile_overrides: dict[str, float] | None = None,
):
if hardware not in HARDWARE_PROFILES:
raise ValueError(f"Unknown hardware {hardware!r}, choose from {list(HARDWARE_PROFILES)}")
- self.hw = {**HARDWARE_PROFILES[hardware], **(profile_overrides or {})}
+ base_profile = HARDWARE_PROFILES[hardware]
+ overrides = profile_overrides or {}
+
+ unknown = set(overrides) - set(base_profile)
+ if unknown:
+ raise ValueError(
+ f"Unknown profile override keys for {hardware!r}: {sorted(unknown)}"
+ )
+
+ for key, value in overrides.items():
+ if not isinstance(value, (int, float)) or not math.isfinite(float(value)) or float(value) <= 0.0:
+ raise ValueError(
+ f"Override {key!r} must be a finite positive number, got {value!r}"
+ )
+
+ self.hw = {**base_profile, **{k: float(v) for k, v in overrides.items()}}
self.hardware = hardware🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/research_engine/arch_cost_model.py` at line 125, The assignment merging
overrides into self.hw currently accepts any keys/values; before performing
self.hw = {**HARDWARE_PROFILES[hardware], **(profile_overrides or {})} validate
profile_overrides: ensure every key exists in HARDWARE_PROFILES[hardware]
(reject/raise on unknown keys to catch typos) and ensure every override value is
a finite positive number (reject/raise on zero, negative, NaN, or infinity);
perform these checks inside the ArchCostModel constructor (or the method where
self.hw is set) and either filter invalid entries or raise a ValueError with a
clear message referencing profile_overrides and the invalid key(s).
Resolve conflicts from PR #122 quick fixes merging into main: - docs/research/kernel-aware-nas.md: combine PR's artifact pack docs with main's generated search description - scripts/nas_experiment.py: keep both PR's quality constraints/calibration and main's generated_candidate_configs() - tests/test_arch_cost_model.py: keep both constraint assertions and generated search assertions none
Closes #111
Summary
python scripts/nas_experiment.py --all-hardware, which emits deterministic A100/T4/H100 JSON and Markdown artifacts.noeris/cost-model-training.jsoncalibration metadata and A100 profile overrides when availableVerification
python3 -m unittest tests.test_arch_cost_model tests.test_public_claim_artifactspython3 -m compileall src/research_engine/arch_cost_model.py scripts/nas_experiment.py tests/test_arch_cost_model.pypython3 scripts/check_public_claim_artifacts.pygit diff --checkSummary by CodeRabbit
New Features
Documentation
Tests