You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
EPIC: Complete migration of sharktank external tests to torch_models
The sharktank external test suite should be fully ported to the newer torch_models test suite format so we can retire .github/workflows/pkgci_test_sharktank.yml and stop maintaining duplicate
model test infrastructure.
Current local inspection shows tests/external/iree-test-suites/sharktank_models
still has 33 JSON manifests, while torch_models has partial coverage. Some
ports exist for SDXL clip/vae/pUNet and Llama, but several sharktank tests are
still missing or are not equivalent.
Goals
Port all remaining sharktank quality and benchmark coverage to tests/external/iree-test-suites/torch_models.
Preserve intended coverage, thresholds, benchmark flags, golden times,
markers, and target/device behavior.
Resolve or explicitly document any intentional behavior changes between
sharktank and torch configs.
Remove sharktank CI only after torch coverage is equivalent.
Breakdown
Port SD3 quality tests.
Missing work: add a torch_models/sd3 area with module configs for CLIP,
MMDiT, and VAE, then add CPU and ROCm quality configs preserving the
original MLIR URLs, weights, input/output files, thresholds, compiler flags,
and run functions.
Missing work: add torch-model equivalents for the scheduler compile-only
coverage. These do not run model quality checks; they validate that the
scheduler MLIR compiles for CPU and ROCm with the same target/device and
preprocessing flags.
Missing work: add torch-model module and test configs for the scheduled UNet
quality cases, the 960x1024 UNet quality cases, and the ROCm benchmark. The
scheduled UNet configs also reference a pipeline module, so the port needs to
preserve that multi-module behavior instead of only compiling the standalone
model.
Missing work: add a torch-model benchmark for the full SDXL pipeline that
compiles and runs the multi-module pipeline (sdxl_clip, sdxl_unet_fp16, sdxl_vae) via tokens_to_image. Preserve the pipeline MLIR, compile flags,
benchmark flags, and per-SKU golden timing expectations.
Missing work: port the fp8 pUNet quality and benchmark coverage, or document
why it is intentionally replaced. The existing torch punet_gfx942_v2 config
does not appear equivalent: it uses different MLIR, different input arity, run_forward instead of main, and different compiler/preprocessing flags.
Missing work: add f16 data-tiling module/test configs for both quality and
benchmark coverage. The existing torch data-tiling configs are under llama_8b_fp8, so they do not replace the sharktank f16 data-tiling cases.
Missing work: decide whether the torch test_greedy_decoder quality test is
intended to replace the sharktank per-function decode_bs4 and prefill_bs4
quality tests. If not, add separate torch quality configs using the original
inputs and functions.
Missing work: for each apparent sharktank-to-torch port, either align the
torch config with the sharktank behavior or document why the change is
intentional.
Benchmark flags differ in several ports, especially --device_allocator=caching, --hip_use_streams=true, and --hip_allow_inline_execution=true.
Golden times differ between sharktank and torch ports; confirm whether
each change is expected, update values if needed, and preserve
tolerance semantics where the old sharktank config had per-SKU
tolerances.
CI cleanup.
Missing work: after coverage parity is demonstrated, remove the sharktank CI
workflow and route all migrated model coverage through pkgci_test_torch.yml.
Also update path triggers and workflow-summary dependencies so sharktank is no
longer scheduled independently.
There do not appear to be explicit *mismatch* config files under tests/external/iree-test-suites, but there are behavioral mismatches in
existing apparent ports. These should be treated as migration blockers unless
they are intentionally accepted and documented.
EPIC: Complete migration of sharktank external tests to torch_models
The sharktank external test suite should be fully ported to the newer
torch_modelstest suite format so we can retire.github/workflows/pkgci_test_sharktank.ymland stop maintaining duplicatemodel test infrastructure.
Current local inspection shows
tests/external/iree-test-suites/sharktank_modelsstill has 33 JSON manifests, while
torch_modelshas partial coverage. Someports exist for SDXL clip/vae/pUNet and Llama, but several sharktank tests are
still missing or are not equivalent.
Goals
tests/external/iree-test-suites/torch_models.markers, and target/device behavior.
sharktank and torch configs.
Breakdown
Port SD3 quality tests.
Missing work: add a
torch_models/sd3area with module configs for CLIP,MMDiT, and VAE, then add CPU and ROCm quality configs preserving the
original MLIR URLs, weights, input/output files, thresholds, compiler flags,
and run functions.
References:
clip_cpu.json,clip_rocm.json,mmdit_cpu.json,mmdit_rocm.json,vae_cpu.json,vae_rocm.json.Port SDXL scheduler compile-only tests.
Missing work: add torch-model equivalents for the scheduler compile-only
coverage. These do not run model quality checks; they validate that the
scheduler MLIR compiles for CPU and ROCm with the same target/device and
preprocessing flags.
References:
scheduler_cpu.json,scheduler_rocm.json.Port SDXL UNet tests.
Missing work: add torch-model module and test configs for the scheduled UNet
quality cases, the 960x1024 UNet quality cases, and the ROCm benchmark. The
scheduled UNet configs also reference a pipeline module, so the port needs to
preserve that multi-module behavior instead of only compiling the standalone
model.
References:
unet_fp16_cpu.json,unet_fp16_rocm.json,unet_fp16_960_1024_cpu.json,unet_fp16_960_1024_rocm.json,benchmarks/sdxl/unet_fp16_rocm.json.Port SDXL end-to-end benchmark.
Missing work: add a torch-model benchmark for the full SDXL pipeline that
compiles and runs the multi-module pipeline (
sdxl_clip,sdxl_unet_fp16,sdxl_vae) viatokens_to_image. Preserve the pipeline MLIR, compile flags,benchmark flags, and per-SKU golden timing expectations.
Reference:
benchmarks/sdxl/e2e_rocm.json.Resolve SDXL pUNet fp8 coverage.
Missing work: port the fp8 pUNet quality and benchmark coverage, or document
why it is intentionally replaced. The existing torch
punet_gfx942_v2configdoes not appear equivalent: it uses different MLIR, different input arity,
run_forwardinstead ofmain, and different compiler/preprocessing flags.References:
punet_int8_fp8_rocm.json,benchmarks/sdxl/punet_int8_fp8_rocm.json,existing torch candidate
punet_gfx942_v2.json.Resolve Llama f16 data-tiling coverage.
Missing work: add f16 data-tiling module/test configs for both quality and
benchmark coverage. The existing torch data-tiling configs are under
llama_8b_fp8, so they do not replace the sharktank f16 data-tiling cases.References:
8b_f16_decode_data_tiling_rocm.json,8b_f16_prefill_data_tiling_rocm.json,benchmarks/llama/8b_f16_decode_data_tiling_rocm.json,benchmarks/llama/8b_f16_prefill_data_tiling_rocm.json.Resolve Llama per-function quality coverage.
Missing work: decide whether the torch
test_greedy_decoderquality test isintended to replace the sharktank per-function
decode_bs4andprefill_bs4quality tests. If not, add separate torch quality configs using the original
inputs and functions.
References:
8b_f16_decode_rocm.json,8b_f16_prefill_rocm.json,existing torch quality config
quality_gfx942.json.Audit apparent ports for mismatches.
Missing work: for each apparent sharktank-to-torch port, either align the
torch config with the sharktank behavior or document why the change is
intentional.
Known mismatches to resolve:
--expected_f16_threshold=0.02fto
0.4f.References:
sharktank vae_cpu.json,torch vae_quality_cpu.json.4x5xi64, torch seq128uses
4x4xi64.References:
sharktank decode benchmark,torch decode seq128 benchmark.--device_allocator=caching,--hip_use_streams=true, and--hip_allow_inline_execution=true.each change is expected, update values if needed, and preserve
tolerance semantics where the old sharktank config had per-SKU
tolerances.
CI cleanup.
Missing work: after coverage parity is demonstrated, remove the sharktank CI
workflow and route all migrated model coverage through
pkgci_test_torch.yml.Also update path triggers and workflow-summary dependencies so sharktank is no
longer scheduled independently.
References:
pkgci_test_sharktank.yml,pkgci_test_torch.yml,pkgci.yml,configure_ci.py.Notes
There do not appear to be explicit
*mismatch*config files undertests/external/iree-test-suites, but there are behavioral mismatches inexisting apparent ports. These should be treated as migration blockers unless
they are intentionally accepted and documented.