Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 27 additions & 3 deletions docs/research/cross-vendor-transfer-eval-protocol.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,21 @@
# Cross-Vendor Transfer Evaluation Protocol

This protocol defines how to validate issue #82 once AMD measurements are available.
This protocol defines how to validate issue #82 once AMD measurements are
available. Current repository status: AMD MI300X/MI250 hardware is unavailable
in this lane, so the cross-vendor AMD result remains scaffold-only.

## Inputs

- Prediction artifact from scaffold lane:
- `docs/results/cross-vendor-zero-shot-scaffold-mi300x.json`
- Measured AMD artifact (to be produced from MI300X/MI250 runs), format:
- Optional measurement template, not evidence:
- `docs/results/cross-vendor-measured-mi300x-template.json`
- Real measured AMD artifact, to be produced only from MI300X/MI250 runs:
- `docs/results/cross-vendor-measured-mi300x.json`

```json
{
"is_measured_evidence": true,
"measured": {
"attention": {
"bucket_name": [
Expand All @@ -20,8 +26,24 @@ This protocol defines how to validate issue #82 once AMD measurements are availa
}
```

The template is marked `is_measured_evidence=false` and uses placeholder
metric/latency values. `scripts/cross_vendor_transfer_eval.py` rejects template,
scaffold, deferred, sample, or non-positive measurement rows.

## Command

Template generation:

```bash
PYTHONPATH=src python3 scripts/cross_vendor_measured_pack_from_prediction.py \
--prediction-json docs/results/cross-vendor-zero-shot-scaffold-mi300x.json \
--output-json docs/results/cross-vendor-measured-mi300x-template.json \
--top-k 5
```

Paper-facing evaluation, valid only after replacing the template with real AMD
measurements in `docs/results/cross-vendor-measured-mi300x.json`:

```bash
PYTHONPATH=src python3 scripts/cross_vendor_transfer_eval.py \
--prediction-json docs/results/cross-vendor-zero-shot-scaffold-mi300x.json \
Expand All @@ -40,4 +62,6 @@ PYTHONPATH=src python3 scripts/cross_vendor_transfer_eval.py \
- `docs/results/cross-vendor-transfer-eval.json`
- `docs/results/cross-vendor-transfer-eval.md`

These outputs are the paper-facing evidence for cross-vendor ranking transfer quality.
These outputs are paper-facing evidence for cross-vendor ranking transfer
quality only when produced from a validator-accepted real AMD measured artifact.
Until then, public wording must describe the MI300X lane as scaffold-only.
14 changes: 11 additions & 3 deletions docs/research/cross-vendor-zero-shot-scaffold.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@ This scaffold predicts candidate config rankings for an unseen hardware label
(`AMD MI300X`) using only existing NVIDIA benchmark data in the local
ConfigDatabase.

Current status: scaffold-only. No AMD MI300X/MI250 benchmark measurements are
available in this repository lane.

## Run

```bash
Expand All @@ -16,15 +19,20 @@ PYTHONPATH=src uv run --python 3.11 --no-project python3 scripts/cross_vendor_ze

- `docs/results/cross-vendor-zero-shot-scaffold-mi300x.json`
- `docs/results/cross-vendor-zero-shot-scaffold-mi300x.md`
- `docs/results/cross-vendor-measured-mi300x-template.json` (placeholder
capture sheet only, not measured evidence)

## What this is (and is not)

- **Is:** a reproducible zero-shot prediction scaffold producing per-operator,
per-bucket top-k candidate configs for an unseen vendor label.
- **Is not:** a validated cross-vendor transfer result yet (no AMD measurements
are used in this artifact).
are used in this artifact). The measured-template JSON is not evidence and is
rejected by the transfer evaluator.

## Next step to convert scaffold into result

Run the predicted candidates on real MI300X/MI250 hardware and compute ranking
transfer metrics (Spearman rho, top-k overlap, best-config hit rate).
Run the predicted candidates on real MI300X/MI250 hardware, write positive
measured `metric` and `latency_ms` rows to
`docs/results/cross-vendor-measured-mi300x.json`, then compute ranking transfer
metrics (Spearman rho, top-k overlap, latency regret).
13 changes: 12 additions & 1 deletion docs/results/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,10 +81,21 @@ KV cache quantize-on-write runtime integration:
- `docs/results/kv-quant-write-runtime-integration.json`
- `docs/results/kv-quant-write-runtime-integration.md`

Cross-vendor zero-shot scaffold (MI300X label, no target measurements):
Cross-vendor zero-shot scaffold (MI300X label, no target measurements; not
paper-facing measured AMD evidence):

- `docs/results/cross-vendor-zero-shot-scaffold-mi300x.json`
- `docs/results/cross-vendor-zero-shot-scaffold-mi300x.md`
- `docs/results/cross-vendor-zero-shot-scaffold-mi300x-v2.json`
- `docs/results/cross-vendor-zero-shot-scaffold-mi300x-v2.md`
- `docs/results/cross-vendor-measured-mi300x-template.json`

The MI300X measured lane is deferred until real MI300X/MI250 hardware results
are available. The historical dry-run artifacts below show report shape only
and must not be cited as cross-vendor AMD transfer evidence:

- `docs/results/cross-vendor-transfer-eval-sample.json`
- `docs/results/cross-vendor-transfer-eval-sample.md`

Kernel-aware NAS multi-hardware latency proxy:

Expand Down
229 changes: 219 additions & 10 deletions docs/results/cross-vendor-measured-mi300x-template.json
Original file line number Diff line number Diff line change
@@ -1,31 +1,240 @@
{
"artifact_type": "cross_vendor_measured_template",
"measurement_status": "placeholder_not_measured",
"is_measured_evidence": false,
"hardware_access": "deferred_no_amd_hardware_in_repo_context",
"generated_at_utc": "2026-05-26T17:16:06.271728+00:00",
"generated_from_prediction": "docs/results/cross-vendor-zero-shot-scaffold-mi300x-v2.json",
"instructions": "Run the listed candidates on real MI300X/MI250 hardware, replace placeholder metric/latency rows with positive measured values, and save that as docs/results/cross-vendor-measured-mi300x.json before running cross_vendor_transfer_eval.py for paper-facing claims.",
"measured": {
"matmul": {},
"rmsnorm": {},
"softmax": {},
"layernorm": {},
"cross_entropy": {},
"attention": {
"short_64": [
{
"config_id": "m64_n64_w2_s3",
"metric": 120.0,
"latency_ms": 1.2
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
},
{
"config_id": "m64_n64_w4_s3",
"metric": 117.5,
"latency_ms": 1.22
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
},
{
"config_id": "m64_n128_w4_s3",
"metric": 115.0,
"latency_ms": 1.25
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
},
{
"config_id": "m64_n32_w2_s4",
"metric": 112.0,
"latency_ms": 1.28
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
},
{
"config_id": "m64_n64_w4_s2",
"metric": 110.0,
"latency_ms": 1.3
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
}
],
"short_128": [
{
"config_id": "m128_n128_w8_s2",
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
},
{
"config_id": "m128_n64_w8_s3",
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
},
{
"config_id": "m128_n32_w4_s2",
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
},
{
"config_id": "m128_n32_w4_s3",
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
},
{
"config_id": "m64_n64_w4_s2",
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
}
],
"med_128": [
{
"config_id": "m128_n128_w8_s2",
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
},
{
"config_id": "m128_n64_w8_s3",
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
},
{
"config_id": "m128_n32_w4_s2",
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
},
{
"config_id": "m128_n32_w4_s3",
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
},
{
"config_id": "m128_n128_w8_s3",
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
}
],
"long_64": [
{
"config_id": "m128_n64_w4_s4",
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
},
{
"config_id": "m64_n64_w2_s3",
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
},
{
"config_id": "m64_n128_w4_s3",
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
},
{
"config_id": "m64_n64_w4_s2",
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
},
{
"config_id": "m64_n64_w4_s3",
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
}
],
"long_128": [
{
"config_id": "m128_n128_w8_s2",
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
},
{
"config_id": "m128_n64_w8_s3",
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
},
{
"config_id": "m128_n32_w4_s2",
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
},
{
"config_id": "m128_n32_w4_s3",
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
},
{
"config_id": "m128_n32_w8_s3",
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
}
],
"llama7b": [
{
"config_id": "m128_n128_w8_s2",
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
},
{
"config_id": "m128_n64_w8_s3",
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
},
{
"config_id": "m128_n32_w4_s2",
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
},
{
"config_id": "m128_n32_w4_s3",
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
},
{
"config_id": "m128_n32_w8_s3",
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
}
],
"mistral": [
{
"config_id": "m128_n128_w8_s3",
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
},
{
"config_id": "m128_n64_w8_s4",
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
},
{
"config_id": "m128_n128_w8_s2",
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
},
{
"config_id": "m128_n64_w8_s3",
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
},
{
"config_id": "m128_n32_w4_s2",
"metric": 0.0,
"latency_ms": 0.0,
"notes": "placeholder: replace with measured AMD metric and latency"
}
]
}
Expand Down
4 changes: 4 additions & 0 deletions docs/results/cross-vendor-transfer-eval-sample.json
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
{
"artifact_type": "cross_vendor_transfer_eval_sample",
"measurement_status": "sample_only_not_measured_amd_evidence",
"is_paper_evidence": false,
"note": "Historical dry-run against a measurement template; retained only to show report shape. Do not cite as AMD transfer evidence.",
"generated_at_utc": "2026-05-07T17:44:11.750856+00:00",
"prediction_artifact": "docs/results/cross-vendor-zero-shot-scaffold-mi300x-v2.json",
"measured_artifact": "docs/results/cross-vendor-measured-mi300x-template.json",
Expand Down
6 changes: 6 additions & 0 deletions docs/results/cross-vendor-transfer-eval-sample.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,15 @@

Generated: 2026-05-07T17:44:11.750856+00:00

Status: `sample_only_not_measured_amd_evidence`
Paper-facing evidence: `false`

Prediction artifact: `docs/results/cross-vendor-zero-shot-scaffold-mi300x-v2.json`
Measured artifact: `docs/results/cross-vendor-measured-mi300x-template.json`

This is a historical dry-run against a measurement template, retained only to
show the report shape. Do not cite these metrics as AMD transfer evidence.

| Operator | Buckets | mean spearman | mean top-k hit | mean latency regret |
|---|---:|---:|---:|---:|
| attention | 1 | 0.0000 | 1.0000 | 0.0000 |
Expand Down
3 changes: 3 additions & 0 deletions docs/results/cross-vendor-zero-shot-scaffold-mi300x-v2.json
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
{
"artifact_type": "cross_vendor_zero_shot_scaffold",
"measurement_status": "prediction_only_no_amd_measurements",
"is_measured_evidence": false,
"generated_at_utc": "2026-05-07T17:43:45.017336+00:00",
"db_path": ".noeris/triton-configs.json",
"training": {
Expand Down
Loading
Loading