diff --git a/docs/research/cross-vendor-transfer-eval-protocol.md b/docs/research/cross-vendor-transfer-eval-protocol.md index cb33d02..b0cd689 100644 --- a/docs/research/cross-vendor-transfer-eval-protocol.md +++ b/docs/research/cross-vendor-transfer-eval-protocol.md @@ -1,15 +1,21 @@ # Cross-Vendor Transfer Evaluation Protocol -This protocol defines how to validate issue #82 once AMD measurements are available. +This protocol defines how to validate issue #82 once AMD measurements are +available. Current repository status: AMD MI300X/MI250 hardware is unavailable +in this lane, so the cross-vendor AMD result remains scaffold-only. ## Inputs - Prediction artifact from scaffold lane: - `docs/results/cross-vendor-zero-shot-scaffold-mi300x.json` -- Measured AMD artifact (to be produced from MI300X/MI250 runs), format: +- Optional measurement template, not evidence: + - `docs/results/cross-vendor-measured-mi300x-template.json` +- Real measured AMD artifact, to be produced only from MI300X/MI250 runs: + - `docs/results/cross-vendor-measured-mi300x.json` ```json { + "is_measured_evidence": true, "measured": { "attention": { "bucket_name": [ @@ -20,8 +26,24 @@ This protocol defines how to validate issue #82 once AMD measurements are availa } ``` +The template is marked `is_measured_evidence=false` and uses placeholder +metric/latency values. `scripts/cross_vendor_transfer_eval.py` rejects template, +scaffold, deferred, sample, or non-positive measurement rows. + ## Command +Template generation: + +```bash +PYTHONPATH=src python3 scripts/cross_vendor_measured_pack_from_prediction.py \ + --prediction-json docs/results/cross-vendor-zero-shot-scaffold-mi300x.json \ + --output-json docs/results/cross-vendor-measured-mi300x-template.json \ + --top-k 5 +``` + +Paper-facing evaluation, valid only after replacing the template with real AMD +measurements in `docs/results/cross-vendor-measured-mi300x.json`: + ```bash PYTHONPATH=src python3 scripts/cross_vendor_transfer_eval.py \ --prediction-json docs/results/cross-vendor-zero-shot-scaffold-mi300x.json \ @@ -40,4 +62,6 @@ PYTHONPATH=src python3 scripts/cross_vendor_transfer_eval.py \ - `docs/results/cross-vendor-transfer-eval.json` - `docs/results/cross-vendor-transfer-eval.md` -These outputs are the paper-facing evidence for cross-vendor ranking transfer quality. +These outputs are paper-facing evidence for cross-vendor ranking transfer +quality only when produced from a validator-accepted real AMD measured artifact. +Until then, public wording must describe the MI300X lane as scaffold-only. diff --git a/docs/research/cross-vendor-zero-shot-scaffold.md b/docs/research/cross-vendor-zero-shot-scaffold.md index 64b50d1..9e5faf4 100644 --- a/docs/research/cross-vendor-zero-shot-scaffold.md +++ b/docs/research/cross-vendor-zero-shot-scaffold.md @@ -6,6 +6,9 @@ This scaffold predicts candidate config rankings for an unseen hardware label (`AMD MI300X`) using only existing NVIDIA benchmark data in the local ConfigDatabase. +Current status: scaffold-only. No AMD MI300X/MI250 benchmark measurements are +available in this repository lane. + ## Run ```bash @@ -16,15 +19,20 @@ PYTHONPATH=src uv run --python 3.11 --no-project python3 scripts/cross_vendor_ze - `docs/results/cross-vendor-zero-shot-scaffold-mi300x.json` - `docs/results/cross-vendor-zero-shot-scaffold-mi300x.md` +- `docs/results/cross-vendor-measured-mi300x-template.json` (placeholder + capture sheet only, not measured evidence) ## What this is (and is not) - **Is:** a reproducible zero-shot prediction scaffold producing per-operator, per-bucket top-k candidate configs for an unseen vendor label. - **Is not:** a validated cross-vendor transfer result yet (no AMD measurements - are used in this artifact). + are used in this artifact). The measured-template JSON is not evidence and is + rejected by the transfer evaluator. ## Next step to convert scaffold into result -Run the predicted candidates on real MI300X/MI250 hardware and compute ranking -transfer metrics (Spearman rho, top-k overlap, best-config hit rate). +Run the predicted candidates on real MI300X/MI250 hardware, write positive +measured `metric` and `latency_ms` rows to +`docs/results/cross-vendor-measured-mi300x.json`, then compute ranking transfer +metrics (Spearman rho, top-k overlap, latency regret). diff --git a/docs/results/README.md b/docs/results/README.md index 253a61f..a1e87ef 100644 --- a/docs/results/README.md +++ b/docs/results/README.md @@ -81,10 +81,21 @@ KV cache quantize-on-write runtime integration: - `docs/results/kv-quant-write-runtime-integration.json` - `docs/results/kv-quant-write-runtime-integration.md` -Cross-vendor zero-shot scaffold (MI300X label, no target measurements): +Cross-vendor zero-shot scaffold (MI300X label, no target measurements; not +paper-facing measured AMD evidence): - `docs/results/cross-vendor-zero-shot-scaffold-mi300x.json` - `docs/results/cross-vendor-zero-shot-scaffold-mi300x.md` +- `docs/results/cross-vendor-zero-shot-scaffold-mi300x-v2.json` +- `docs/results/cross-vendor-zero-shot-scaffold-mi300x-v2.md` +- `docs/results/cross-vendor-measured-mi300x-template.json` + +The MI300X measured lane is deferred until real MI300X/MI250 hardware results +are available. The historical dry-run artifacts below show report shape only +and must not be cited as cross-vendor AMD transfer evidence: + +- `docs/results/cross-vendor-transfer-eval-sample.json` +- `docs/results/cross-vendor-transfer-eval-sample.md` Kernel-aware NAS multi-hardware latency proxy: diff --git a/docs/results/cross-vendor-measured-mi300x-template.json b/docs/results/cross-vendor-measured-mi300x-template.json index 0063e92..b30c8d6 100644 --- a/docs/results/cross-vendor-measured-mi300x-template.json +++ b/docs/results/cross-vendor-measured-mi300x-template.json @@ -1,31 +1,240 @@ { + "artifact_type": "cross_vendor_measured_template", + "measurement_status": "placeholder_not_measured", + "is_measured_evidence": false, + "hardware_access": "deferred_no_amd_hardware_in_repo_context", + "generated_at_utc": "2026-05-26T17:16:06.271728+00:00", + "generated_from_prediction": "docs/results/cross-vendor-zero-shot-scaffold-mi300x-v2.json", + "instructions": "Run the listed candidates on real MI300X/MI250 hardware, replace placeholder metric/latency rows with positive measured values, and save that as docs/results/cross-vendor-measured-mi300x.json before running cross_vendor_transfer_eval.py for paper-facing claims.", "measured": { + "matmul": {}, + "rmsnorm": {}, + "softmax": {}, + "layernorm": {}, + "cross_entropy": {}, "attention": { "short_64": [ { "config_id": "m64_n64_w2_s3", - "metric": 120.0, - "latency_ms": 1.2 + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" }, { "config_id": "m64_n64_w4_s3", - "metric": 117.5, - "latency_ms": 1.22 + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" }, { "config_id": "m64_n128_w4_s3", - "metric": 115.0, - "latency_ms": 1.25 + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" }, { "config_id": "m64_n32_w2_s4", - "metric": 112.0, - "latency_ms": 1.28 + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" }, { "config_id": "m64_n64_w4_s2", - "metric": 110.0, - "latency_ms": 1.3 + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" + } + ], + "short_128": [ + { + "config_id": "m128_n128_w8_s2", + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" + }, + { + "config_id": "m128_n64_w8_s3", + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" + }, + { + "config_id": "m128_n32_w4_s2", + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" + }, + { + "config_id": "m128_n32_w4_s3", + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" + }, + { + "config_id": "m64_n64_w4_s2", + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" + } + ], + "med_128": [ + { + "config_id": "m128_n128_w8_s2", + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" + }, + { + "config_id": "m128_n64_w8_s3", + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" + }, + { + "config_id": "m128_n32_w4_s2", + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" + }, + { + "config_id": "m128_n32_w4_s3", + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" + }, + { + "config_id": "m128_n128_w8_s3", + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" + } + ], + "long_64": [ + { + "config_id": "m128_n64_w4_s4", + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" + }, + { + "config_id": "m64_n64_w2_s3", + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" + }, + { + "config_id": "m64_n128_w4_s3", + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" + }, + { + "config_id": "m64_n64_w4_s2", + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" + }, + { + "config_id": "m64_n64_w4_s3", + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" + } + ], + "long_128": [ + { + "config_id": "m128_n128_w8_s2", + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" + }, + { + "config_id": "m128_n64_w8_s3", + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" + }, + { + "config_id": "m128_n32_w4_s2", + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" + }, + { + "config_id": "m128_n32_w4_s3", + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" + }, + { + "config_id": "m128_n32_w8_s3", + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" + } + ], + "llama7b": [ + { + "config_id": "m128_n128_w8_s2", + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" + }, + { + "config_id": "m128_n64_w8_s3", + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" + }, + { + "config_id": "m128_n32_w4_s2", + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" + }, + { + "config_id": "m128_n32_w4_s3", + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" + }, + { + "config_id": "m128_n32_w8_s3", + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" + } + ], + "mistral": [ + { + "config_id": "m128_n128_w8_s3", + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" + }, + { + "config_id": "m128_n64_w8_s4", + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" + }, + { + "config_id": "m128_n128_w8_s2", + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" + }, + { + "config_id": "m128_n64_w8_s3", + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" + }, + { + "config_id": "m128_n32_w4_s2", + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD metric and latency" } ] } diff --git a/docs/results/cross-vendor-transfer-eval-sample.json b/docs/results/cross-vendor-transfer-eval-sample.json index bec3f12..f70b3f1 100644 --- a/docs/results/cross-vendor-transfer-eval-sample.json +++ b/docs/results/cross-vendor-transfer-eval-sample.json @@ -1,4 +1,8 @@ { + "artifact_type": "cross_vendor_transfer_eval_sample", + "measurement_status": "sample_only_not_measured_amd_evidence", + "is_paper_evidence": false, + "note": "Historical dry-run against a measurement template; retained only to show report shape. Do not cite as AMD transfer evidence.", "generated_at_utc": "2026-05-07T17:44:11.750856+00:00", "prediction_artifact": "docs/results/cross-vendor-zero-shot-scaffold-mi300x-v2.json", "measured_artifact": "docs/results/cross-vendor-measured-mi300x-template.json", diff --git a/docs/results/cross-vendor-transfer-eval-sample.md b/docs/results/cross-vendor-transfer-eval-sample.md index 24d5d1c..7c72e5b 100644 --- a/docs/results/cross-vendor-transfer-eval-sample.md +++ b/docs/results/cross-vendor-transfer-eval-sample.md @@ -2,9 +2,15 @@ Generated: 2026-05-07T17:44:11.750856+00:00 +Status: `sample_only_not_measured_amd_evidence` +Paper-facing evidence: `false` + Prediction artifact: `docs/results/cross-vendor-zero-shot-scaffold-mi300x-v2.json` Measured artifact: `docs/results/cross-vendor-measured-mi300x-template.json` +This is a historical dry-run against a measurement template, retained only to +show the report shape. Do not cite these metrics as AMD transfer evidence. + | Operator | Buckets | mean spearman | mean top-k hit | mean latency regret | |---|---:|---:|---:|---:| | attention | 1 | 0.0000 | 1.0000 | 0.0000 | diff --git a/docs/results/cross-vendor-zero-shot-scaffold-mi300x-v2.json b/docs/results/cross-vendor-zero-shot-scaffold-mi300x-v2.json index b356dbb..31baa2d 100644 --- a/docs/results/cross-vendor-zero-shot-scaffold-mi300x-v2.json +++ b/docs/results/cross-vendor-zero-shot-scaffold-mi300x-v2.json @@ -1,4 +1,7 @@ { + "artifact_type": "cross_vendor_zero_shot_scaffold", + "measurement_status": "prediction_only_no_amd_measurements", + "is_measured_evidence": false, "generated_at_utc": "2026-05-07T17:43:45.017336+00:00", "db_path": ".noeris/triton-configs.json", "training": { diff --git a/docs/results/cross-vendor-zero-shot-scaffold-mi300x-v2.md b/docs/results/cross-vendor-zero-shot-scaffold-mi300x-v2.md index 1c542fd..2df273f 100644 --- a/docs/results/cross-vendor-zero-shot-scaffold-mi300x-v2.md +++ b/docs/results/cross-vendor-zero-shot-scaffold-mi300x-v2.md @@ -4,6 +4,8 @@ Generated: 2026-05-07T17:43:45.017336+00:00 Source hardware filter: `A100` Target hardware label: `AMD MI300X` +Measurement status: `prediction_only_no_amd_measurements` +Measured AMD evidence: `false` | Operator | Buckets with candidates | total candidates | |---|---:|---:| diff --git a/docs/results/cross-vendor-zero-shot-scaffold-mi300x.json b/docs/results/cross-vendor-zero-shot-scaffold-mi300x.json index 43eb161..1aa247d 100644 --- a/docs/results/cross-vendor-zero-shot-scaffold-mi300x.json +++ b/docs/results/cross-vendor-zero-shot-scaffold-mi300x.json @@ -1,4 +1,7 @@ { + "artifact_type": "cross_vendor_zero_shot_scaffold", + "measurement_status": "prediction_only_no_amd_measurements", + "is_measured_evidence": false, "generated_at_utc": "2026-04-21T07:22:17.189863+00:00", "db_path": ".noeris/triton-configs.json", "training": { diff --git a/docs/results/cross-vendor-zero-shot-scaffold-mi300x.md b/docs/results/cross-vendor-zero-shot-scaffold-mi300x.md index 8047982..a676dcd 100644 --- a/docs/results/cross-vendor-zero-shot-scaffold-mi300x.md +++ b/docs/results/cross-vendor-zero-shot-scaffold-mi300x.md @@ -4,6 +4,8 @@ Generated: 2026-04-21T07:22:17.189863+00:00 Source hardware filter: `A100` Target hardware label: `AMD MI300X` +Measurement status: `prediction_only_no_amd_measurements` +Measured AMD evidence: `false` | Operator | Buckets with candidates | total candidates | |---|---:|---:| diff --git a/scripts/cross_vendor_measured_pack_from_prediction.py b/scripts/cross_vendor_measured_pack_from_prediction.py index 86e363d..1c5db03 100644 --- a/scripts/cross_vendor_measured_pack_from_prediction.py +++ b/scripts/cross_vendor_measured_pack_from_prediction.py @@ -5,10 +5,11 @@ import argparse import json +from datetime import datetime, timezone from pathlib import Path -def main() -> int: +def main(argv: list[str] | None = None) -> int: parser = argparse.ArgumentParser(description=__doc__) parser.add_argument( "--prediction-json", @@ -16,10 +17,10 @@ def main() -> int: ) parser.add_argument( "--output-json", - default="docs/results/cross-vendor-measured-mi300x.json", + default="docs/results/cross-vendor-measured-mi300x-template.json", ) parser.add_argument("--top-k", type=int, default=5) - args = parser.parse_args() + args = parser.parse_args(argv) prediction_path = Path(args.prediction_json) payload = json.loads(prediction_path.read_text(encoding="utf-8")) @@ -39,7 +40,7 @@ def main() -> int: "config_id": cid, "metric": 0.0, "latency_ms": 0.0, - "notes": "fill with measured AMD result", + "notes": "placeholder: replace with measured AMD metric and latency", } ) if rows: @@ -47,7 +48,18 @@ def main() -> int: measured[operator] = op_rows out = { + "artifact_type": "cross_vendor_measured_template", + "measurement_status": "placeholder_not_measured", + "is_measured_evidence": False, + "hardware_access": "deferred_no_amd_hardware_in_repo_context", + "generated_at_utc": datetime.now(timezone.utc).isoformat(), "generated_from_prediction": str(prediction_path), + "instructions": ( + "Run the listed candidates on real MI300X/MI250 hardware, replace " + "placeholder metric/latency rows with positive measured values, " + "and save that as docs/results/cross-vendor-measured-mi300x.json " + "before running cross_vendor_transfer_eval.py for paper-facing claims." + ), "measured": measured, } out_path = Path(args.output_json) diff --git a/scripts/cross_vendor_transfer_eval.py b/scripts/cross_vendor_transfer_eval.py index 5b80844..0697b26 100644 --- a/scripts/cross_vendor_transfer_eval.py +++ b/scripts/cross_vendor_transfer_eval.py @@ -5,6 +5,7 @@ import argparse import json +import math from datetime import datetime, timezone from pathlib import Path @@ -20,6 +21,74 @@ def _load_json(path: Path) -> dict: return json.loads(path.read_text(encoding="utf-8")) +def _contains_marker(value: object, markers: tuple[str, ...]) -> str | None: + text = str(value or "").lower() + return next((marker for marker in markers if marker in text), None) + + +def _positive_finite(row: dict, field: str, label: str) -> None: + try: + value = float(row[field]) + except (KeyError, TypeError, ValueError) as exc: + raise ValueError(f"{label} is missing numeric {field}") from exc + if not math.isfinite(value) or value <= 0.0: + raise ValueError(f"{label} has non-positive {field}: {value}") + + +def validate_measured_artifact(measured: dict, *, measured_path: Path | None = None) -> dict: + """Reject scaffold/template artifacts before computing paper-facing metrics.""" + + artifact_markers = ("placeholder", "template", "scaffold", "deferred", "sample") + row_markers = ("placeholder", "fill", "replace", "not measured", "scaffold", "template") + reasons: list[str] = [] + + if measured.get("is_measured_evidence") is False: + reasons.append("is_measured_evidence=false") + + for field in ("artifact_type", "measurement_status", "status"): + marker = _contains_marker(measured.get(field), artifact_markers) + if marker: + reasons.append(f"{field} contains {marker!r}") + + if measured_path is not None: + marker = _contains_marker(measured_path.name, artifact_markers) + if marker: + reasons.append(f"measured path contains {marker!r}") + + if reasons: + raise ValueError( + "measured AMD artifact is not eligible for paper-facing evaluation: " + + "; ".join(reasons) + ) + + measured_rows = measured.get("measured") + if not isinstance(measured_rows, dict): + raise ValueError("measured AMD artifact must contain a measured object") + + row_count = 0 + for operator, buckets in measured_rows.items(): + if not isinstance(buckets, dict): + raise ValueError(f"measured.{operator} must be an object of buckets") + for bucket, rows in buckets.items(): + if not isinstance(rows, list): + raise ValueError(f"measured.{operator}.{bucket} must be a list") + for idx, row in enumerate(rows): + label = f"measured.{operator}.{bucket}[{idx}]" + if not isinstance(row, dict): + raise ValueError(f"{label} must be an object") + marker = _contains_marker(row.get("notes"), row_markers) + if marker: + raise ValueError(f"{label} notes contain placeholder marker {marker!r}") + _positive_finite(row, "metric", label) + _positive_finite(row, "latency_ms", label) + row_count += 1 + + if row_count == 0: + raise ValueError("measured AMD artifact contains no measurement rows") + + return {"status": "passed_real_measurement_checks", "row_count": row_count} + + def _to_md(report: dict) -> str: lines = [ "# Cross-Vendor Transfer Evaluation", @@ -28,6 +97,14 @@ def _to_md(report: dict) -> str: "", f"Prediction artifact: `{report['prediction_artifact']}`", f"Measured artifact: `{report['measured_artifact']}`", + ] + validation = report.get("measurement_validation", {}) + if validation: + lines.append( + f"Measurement validation: `{validation.get('status', 'passed')}` " + f"({validation.get('row_count', 0)} rows)" + ) + lines += [ "", "| Operator | Buckets | mean spearman | mean top-k hit | mean latency regret |", "|---|---:|---:|---:|---:|", @@ -126,6 +203,10 @@ def main() -> int: measured_path = Path(args.measured_json) prediction = _load_json(prediction_path) measured = _load_json(measured_path) + try: + measurement_validation = validate_measured_artifact(measured, measured_path=measured_path) + except ValueError as exc: + parser.error(str(exc)) eval_out = evaluate_transfer(prediction=prediction, measured=measured, top_k=args.top_k) cmd = f"python scripts/cross_vendor_transfer_eval.py --prediction-json {args.prediction_json} --measured-json {args.measured_json}" @@ -134,6 +215,7 @@ def main() -> int: "environment": collect_environment(command=cmd), "prediction_artifact": str(prediction_path), "measured_artifact": str(measured_path), + "measurement_validation": measurement_validation, "top_k": args.top_k, **eval_out, } diff --git a/scripts/cross_vendor_zero_shot_scaffold.py b/scripts/cross_vendor_zero_shot_scaffold.py index be4793c..6d7eaa5 100644 --- a/scripts/cross_vendor_zero_shot_scaffold.py +++ b/scripts/cross_vendor_zero_shot_scaffold.py @@ -36,6 +36,8 @@ def _to_md(report: dict) -> str: "", f"Source hardware filter: `{report['source_hardware_filter']}`", f"Target hardware label: `{report['target_hardware_label']}`", + f"Measurement status: `{report['measurement_status']}`", + f"Measured AMD evidence: `{str(report['is_measured_evidence']).lower()}`", "", "| Operator | Buckets with candidates | total candidates |", "|---|---:|---:|", @@ -125,6 +127,9 @@ def main() -> int: } report = { + "artifact_type": "cross_vendor_zero_shot_scaffold", + "measurement_status": "prediction_only_no_amd_measurements", + "is_measured_evidence": False, "generated_at_utc": datetime.now(timezone.utc).isoformat(), "db_path": str(db_path), "training": train_result, diff --git a/tests/test_cross_vendor_measured_pack_from_prediction.py b/tests/test_cross_vendor_measured_pack_from_prediction.py index ff98a16..a8e7840 100644 --- a/tests/test_cross_vendor_measured_pack_from_prediction.py +++ b/tests/test_cross_vendor_measured_pack_from_prediction.py @@ -1,12 +1,30 @@ from __future__ import annotations +import importlib.util +import io import json -import subprocess import tempfile import unittest +from contextlib import redirect_stdout from pathlib import Path +REPO = Path(__file__).resolve().parent.parent + + +def _load_script_module(): + path = REPO / "scripts" / "cross_vendor_measured_pack_from_prediction.py" + spec = importlib.util.spec_from_file_location( + "cross_vendor_measured_pack_from_prediction", + path, + ) + assert spec is not None + module = importlib.util.module_from_spec(spec) + assert spec.loader is not None + spec.loader.exec_module(module) + return module + + class CrossVendorMeasuredPackFromPredictionTests(unittest.TestCase): def test_generates_measured_template(self) -> None: with tempfile.TemporaryDirectory() as td: @@ -30,9 +48,8 @@ def test_generates_measured_template(self) -> None: ), encoding="utf-8", ) - cmd = [ - "python3", - "scripts/cross_vendor_measured_pack_from_prediction.py", + module = _load_script_module() + argv = [ "--prediction-json", str(pred), "--output-json", @@ -40,11 +57,19 @@ def test_generates_measured_template(self) -> None: "--top-k", "2", ] - subprocess.run(cmd, check=True) + with redirect_stdout(io.StringIO()): + status = module.main(argv) data = json.loads(out.read_text(encoding="utf-8")) + self.assertEqual(status, 0) + self.assertEqual(data["artifact_type"], "cross_vendor_measured_template") + self.assertEqual(data["measurement_status"], "placeholder_not_measured") + self.assertFalse(data["is_measured_evidence"]) rows = data["measured"]["attention"]["bucket_a"] self.assertEqual(len(rows), 2) self.assertEqual(rows[0]["config_id"], "c1") + self.assertEqual(rows[0]["metric"], 0.0) + self.assertEqual(rows[0]["latency_ms"], 0.0) + self.assertIn("placeholder", rows[0]["notes"]) if __name__ == "__main__": diff --git a/tests/test_cross_vendor_transfer_eval.py b/tests/test_cross_vendor_transfer_eval.py index 19fd01c..55b8db0 100644 --- a/tests/test_cross_vendor_transfer_eval.py +++ b/tests/test_cross_vendor_transfer_eval.py @@ -9,7 +9,11 @@ REPO = Path(__file__).resolve().parent.parent sys.path.insert(0, str(REPO / "scripts")) -from cross_vendor_transfer_eval import _to_md, evaluate_transfer # noqa: E402 +from cross_vendor_transfer_eval import ( # noqa: E402 + _to_md, + evaluate_transfer, + validate_measured_artifact, +) class CrossVendorTransferEvalTests(unittest.TestCase): @@ -49,6 +53,10 @@ def test_markdown_contains_summary_columns(self) -> None: "generated_at_utc": "2026-05-07T00:00:00+00:00", "prediction_artifact": "docs/results/pred.json", "measured_artifact": "docs/results/meas.json", + "measurement_validation": { + "status": "passed_real_measurement_checks", + "row_count": 3, + }, "operator_summary": { "attention": { "bucket_count": 1, @@ -60,8 +68,60 @@ def test_markdown_contains_summary_columns(self) -> None: } md = _to_md(report) self.assertIn("mean spearman", md) + self.assertIn("Measurement validation", md) self.assertIn("0.7500", md) + def test_validate_measured_artifact_rejects_template_markers(self) -> None: + measured = { + "artifact_type": "cross_vendor_measured_template", + "measurement_status": "placeholder_not_measured", + "is_measured_evidence": False, + "measured": { + "attention": { + "bucket_a": [ + { + "config_id": "c1", + "metric": 0.0, + "latency_ms": 0.0, + "notes": "placeholder: replace with measured AMD result", + } + ] + } + }, + } + with self.assertRaisesRegex(ValueError, "not eligible"): + validate_measured_artifact(measured, measured_path=Path("measured-template.json")) + + def test_validate_measured_artifact_accepts_positive_rows(self) -> None: + measured = { + "measured": { + "attention": { + "bucket_a": [ + {"config_id": "c1", "metric": 95.0, "latency_ms": 1.05}, + {"config_id": "c2", "metric": 90.0, "latency_ms": 1.10}, + ] + } + } + } + validation = validate_measured_artifact(measured, measured_path=Path("measured-real.json")) + self.assertEqual(validation["row_count"], 2) + + def test_validate_measured_artifact_rejects_template_path_even_if_positive(self) -> None: + measured = { + "measured": { + "attention": { + "bucket_a": [ + {"config_id": "c1", "metric": 95.0, "latency_ms": 1.05}, + ] + } + } + } + with self.assertRaisesRegex(ValueError, "measured path contains"): + validate_measured_artifact( + measured, + measured_path=Path("docs/results/cross-vendor-measured-mi300x-template.json"), + ) + if __name__ == "__main__": unittest.main()