Skip to content

Guard MI300X cross-vendor scaffold artifacts#124

Open
Darkroom4364 wants to merge 2 commits into
mainfrom
issue-116-cross-vendor-scaffold-only
Open

Guard MI300X cross-vendor scaffold artifacts#124
Darkroom4364 wants to merge 2 commits into
mainfrom
issue-116-cross-vendor-scaffold-only

Conversation

@Darkroom4364

@Darkroom4364 Darkroom4364 commented May 26, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Mark the MI300X cross-vendor lane as scaffold-only when AMD hardware is unavailable.
  • Regenerate the MI300X measured template with non-evidence metadata, zero placeholder rows, and replacement instructions.
  • Add evaluator validation so template, scaffold, sample, deferred, or non-positive measured artifacts cannot produce paper-facing transfer metrics.

Verification

  • python3 -m unittest tests.test_cross_vendor_transfer_eval tests.test_cross_vendor_measured_pack_from_prediction tests.test_public_claim_artifacts
  • python3 -m compileall scripts/cross_vendor_transfer_eval.py scripts/cross_vendor_measured_pack_from_prediction.py scripts/cross_vendor_zero_shot_scaffold.py
  • python3 scripts/check_public_claim_artifacts.py
  • git diff --check
  • Expected failure: PYTHONPATH=src python3 scripts/cross_vendor_transfer_eval.py --prediction-json docs/results/cross-vendor-zero-shot-scaffold-mi300x-v2.json --measured-json docs/results/cross-vendor-measured-mi300x-template.json --output-json /tmp/cross-vendor-should-not-write.json --output-md /tmp/cross-vendor-should-not-write.md

Note: ruff is not installed in this environment, so ruff check and python3 -m ruff check ... could not run.

Closes #116

Summary by CodeRabbit

Release Notes

  • Documentation

    • Clarified that MI300X cross-vendor results are currently scaffold-only with placeholder measurements pending real hardware data.
    • Updated guidance to distinguish between predicted scaffolds and measured evidence artifacts.
  • New Features

    • Added validation for measured artifact templates to ensure data integrity before evaluation.
  • Chores

    • Enhanced metadata fields across result artifacts to explicitly track measurement status and evidence type.
    • Improved script workflows for generating and processing measured artifact templates.

Review Change Stack

@coderabbitai

coderabbitai Bot commented May 26, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: c35cda6a-7334-457d-8c64-731b44d08420

📥 Commits

Reviewing files that changed from the base of the PR and between 22b0a3c and d632ac1.

📒 Files selected for processing (2)
  • scripts/cross_vendor_measured_pack_from_prediction.py
  • tests/test_cross_vendor_measured_pack_from_prediction.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • scripts/cross_vendor_measured_pack_from_prediction.py

📝 Walkthrough

Walkthrough

Adds metadata to mark MI300X artifacts as scaffold-only, creates a measured-template generator with timestamps/instructions, enforces measured-artifact validation in the evaluator (rejecting templates/non-positive rows), updates scaffold/docs/outputs, and expands tests to cover generation and validation behaviors.

Changes

Cross-vendor artifact metadata and validation

Layer / File(s) Summary
Protocol & docs marking MI300X scaffold-only
docs/research/cross-vendor-transfer-eval-protocol.md, docs/research/cross-vendor-zero-shot-scaffold.md, docs/results/README.md
Protocol and scaffold docs now state MI300X is scaffold-only and clarify when evaluation outputs count as paper-facing evidence.
Measured-template artifact files
docs/results/cross-vendor-measured-mi300x-template.json, docs/results/cross-vendor-transfer-eval-sample.json, docs/results/cross-vendor-transfer-eval-sample.md
Measured-template and sample artifacts are marked as placeholders with metadata (artifact_type, measurement_status, is_measured_evidence), zeroed metric/latency rows, per-row notes, and generation instructions.
Zero-shot scaffold artifacts & markdown
docs/results/cross-vendor-zero-shot-scaffold-mi300x.json, docs/results/cross-vendor-zero-shot-scaffold-mi300x.md, docs/results/cross-vendor-zero-shot-scaffold-mi300x-v2.json, docs/results/cross-vendor-zero-shot-scaffold-mi300x-v2.md, scripts/cross_vendor_zero_shot_scaffold.py
Zero-shot scaffolds now include measurement metadata and the markdown renderer surfaces measurement_status and is_measured_evidence fields.
Template generator CLI and output shaping
scripts/cross_vendor_measured_pack_from_prediction.py
Generator adds UTC timestamp, accepts optional argv, defaults output to the template path, and emits a template JSON with metadata, generated_at_utc, instructions, and per-row notes for measured AMD values.
Measured-artifact validation in evaluator
scripts/cross_vendor_transfer_eval.py
Added math import and validate_measured_artifact() that rejects template/placeholder artifacts or rows with non-finite/non-positive metrics; main() aborts on validation failure and includes measurement_validation in JSON/MD outputs.
Tests: generation and validation coverage
tests/test_cross_vendor_measured_pack_from_prediction.py, tests/test_cross_vendor_transfer_eval.py
Tests now dynamically import the generator module and call main(argv); they assert template metadata and placeholder rows, add tests for validate_measured_artifact (template rejection, valid acceptance, path-based template detection), and check markdown includes measurement_validation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A scaffold waits with zeros in place,

Timestamps and notes for the measured race.
Validation guards the paper-bound gate,
Until true AMD numbers fill each slate.
Hop—measure, replace—then evidence is straight.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title clearly describes the main change: guarding MI300X cross-vendor scaffold artifacts with metadata markers and validation controls.
Linked Issues check ✅ Passed PR meets all coding requirements from #116: marks scaffold as scaffold-only with metadata, rejects template artifacts in validation, includes evaluation metrics (Spearman, top-k, latency regret), and ensures no paper-facing wording treats scaffold as measured evidence.
Out of Scope Changes check ✅ Passed All changes are scoped to the #116 objectives: documentation clarifications, metadata markers for scaffold/template artifacts, validation logic for measured artifacts, and test updates—no unrelated changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch issue-116-cross-vendor-scaffold-only

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Cross-vendor] Replace MI300X measured template with real AMD artifact

1 participant