microsoft · timenick · Apr 24, 2026 · Apr 21, 2026 · Apr 21, 2026 · Apr 21, 2026
@@ -16,7 +16,7 @@ Run `uv run pytest tests/` after every implementation or test revision. Never as
 
 ### 4. Never Skip Failing Tests
 
-Investigate root cause and fix the underlying issue. Never use `pytest.mark.skip` or `xfail` to hide failures. Skips are only acceptable for hardware/EP requirements (CUDA, DirectML, AVX).
+Investigate root cause and fix the underlying issue. Never use `pytest.mark.skip` or `xfail` to hide failures. Skips are only acceptable for hardware/EP requirements (CUDA, Dml, AVX).
 
 ## Development Commands
 

@@ -27,9 +27,9 @@
 | **QNN** | Qualcomm NPU (Snapdragon X Elite) | 🟢 Ready | `--ep qnn` | `--device npu` |
 | **OpenVINO** | Intel NPU (Meteor Lake / Lunar Lake) | 🟢 Ready | `--ep openvino` | `--device npu` |
 | **VitisAI** | AMD NPU (Ryzen AI) | 🟢 Ready | `--ep vitisai` | `--device npu` |
-| **TensorRT** | NVIDIA discrete GPUs | 🔶 Planned | `--ep tensorrt` | `--device gpu` |
+| **NvTensorRTRTX** | NVIDIA discrete GPUs | 🔶 Planned | `--ep nv_tensorrt_rtx` | `--device gpu` |
 | **MIGraphX** | AMD discrete GPUs | 🔶 Planned | `--ep migraphx` | `--device gpu` |
-| **DirectML** | Hardware-agnostic GPU backend | 🔶 Planned | `--ep dml` | `--device gpu` |
+| **Dml** | Hardware-agnostic GPU backend | 🔶 Planned | `--ep dml` | `--device gpu` |
 | **CPU** | Cross-platform fallback | ⚪ Always available | `--ep cpu` | `--device cpu` |
 
 > **Tip:** Use `--device auto` and ModelKit picks the best available device — NPU first, then GPU, then CPU.
@@ -398,7 +398,7 @@ Supported tasks include:
 |:----------|:-------|:-----------|
 | 🟡 **Kickoff** | Q4 2025 | Internal prototype, core primitive commands |
 | 🟢 **Early Access** | Q1 2026 | First external testers, config + build pipeline, hub catalog |
-| 🔵 **Public Beta** | Q2 2026 | Open source, agent skills, AI Toolkit integration |
+| 🔵 **Public Beta** | Q2 2026 | Open source, agent skills, Foundry Toolkit integration |
 | 🟣 **RC** | Q3-Q4 2026 | **LLM support** (with LoRA), broader device coverage, MLIR |
 
 <details>
@@ -418,11 +418,11 @@ Supported tasks include:
 **Q2 2026 — Public Beta**
 - Open source release
 - Agent-ready skills for coding assistants (Claude Code, Cursor, Copilot)
-- AI Toolkit for VS Code integration
+- Foundry Toolkit for VS Code integration
 
 **Q3-Q4 2026 — Release Candidate**
 - LLM support (decoder-only architectures with LoRA adapters)
-- TensorRT, MIGraphX, and DirectML execution providers
+- NvTensorRTRTX, MIGraphX, and Dml execution providers
 - MLIR-based optimization backend
 - Public SDK and framework APIs
 

@@ -380,8 +380,8 @@ markers = [
   "qnn: marks tests for QNN backend",
   "openvino: marks tests for OpenVINO backend",
   "cuda: marks tests requiring CUDA runtime",
-  "directml: marks tests for DirectML backend",
-  "tensorrt: marks tests for TensorRT backend",
+  "dml: marks tests for Dml backend",
+  "nv_tensorrt_rtx: marks tests for NvTensorRTRTX backend",
   "vitisai: marks tests for AMD Vitis AI backend",
   "training: marks tests requiring training-specific ORT features",
 ]

@@ -58,7 +58,7 @@ uv run python scripts/e2e_eval/run_eval.py
 # Filter by priority / task / group
 uv run python scripts/e2e_eval/run_eval.py --priority P0
 uv run python scripts/e2e_eval/run_eval.py --task image-classification
-uv run python scripts/e2e_eval/run_eval.py --group AITK
+uv run python scripts/e2e_eval/run_eval.py --group "Foundry Toolkit"
 
 # Single ad-hoc model
 uv run python scripts/e2e_eval/run_eval.py --hf-model microsoft/resnet-50
@@ -84,7 +84,7 @@ uv run python scripts/e2e_eval/run_eval.py --retry-failed
 | `--task` | — | Filter by HF task |
 | `--priority` | — | Filter: `P0`, `P1`, `P2` |
 | `--model-type` | — | Filter by model_type (e.g. `bert`) |
-| `--group` | — | Filter by group (e.g. `AITK`) |
+| `--group` | — | Filter by group (e.g. `Foundry Toolkit`) |
 | `--device` | `auto` | Target device |
 | `--timeout` | 600 | Per-model timeout (seconds) |
 | `--list` | off | List models and exit |
@@ -118,7 +118,7 @@ uv run python scripts/e2e_eval/generate_report.py --input-dir eval_results/2026-
 | **P1** | Important — tracked closely, regressions flagged |
 | **P2** | Extended coverage — best-effort |
 
-Groups (`AITK`, `Benchmark`, `Top200`, etc.) categorize models by source/purpose.
+Groups (`Foundry Toolkit`, `Benchmark`, `Top200`, etc.) categorize models by source/purpose.
 
 ### Failure Classification
 

@@ -257,7 +257,7 @@ def build_registry(
             if is_p0:
                 priority = "P0"
                 group = p0_group_lookup.get((model_id, task)) or p0_model_group.get(
-                    model_id, "AITK"
+                    model_id, "Foundry Toolkit"
                 )
             else:
                 priority = "P1"