refactor!: remove deprecated Eval API and related APIs#5290
refactor!: remove deprecated Eval API and related APIs#5290leseb wants to merge 18 commits intollamastack:mainfrom
Conversation
…o, scoring, scoring_functions, benchmarks) Remove the Eval API and all connected APIs that were already marked as deprecated in the spec. This includes the datasets, datasetio, scoring, scoring_functions, and benchmarks APIs along with all their provider implementations, routing tables, routers, registry entries, distribution configs, and tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>
✱ Stainless preview buildsThis PR will update the Edit this comment to update it. It will appear in the SDK's changelogs. ✅ llama-stack-client-go studio · conflict
✅ llama-stack-client-python studio · conflict
✅ llama-stack-client-openapi studio · code · diff
✅ llama-stack-client-node studio · conflict
This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push. |
|
This pull request has merge conflicts that must be resolved before it can be merged. @leseb please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
Signed-off-by: Sébastien Han <seb@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
|
This pull request has merge conflicts that must be resolved before it can be merged. @leseb please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
…nses rename Merge upstream/main into remove-eval-api branch, resolving conflicts caused by the agents-to-responses API rename (PR llamastack#5195) interacting with the eval API removal. This includes fixing distribution templates that still referenced the old "agents" API name, removing the stale DatasetPurpose import from template.py, cleaning up the stainless config to remove references to deleted eval/scoring/dataset endpoints, removing the now-orphaned nvidia datasetio test, and regenerating all OpenAPI specs and distribution configs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>
Resolve merge conflicts from upstream/main. All modify/delete conflicts are resolved by keeping the deletions from this branch since the purpose of this branch is to remove the eval, scoring, datasetio, and benchmarks APIs. Content conflicts in datatypes.py files are resolved by excluding the eval-related classes that upstream added. Signed-off-by: Sébastien Han <seb@redhat.com>
Merge upstream/main into remove-eval-api branch. The Dell distribution was removed upstream, so the conflicting Dell files (config.yaml, dell.py, run-with-safety.yaml) are resolved by accepting the upstream deletion. Signed-off-by: Sebastien Han <seb@redhat.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>
Merge upstream/main into remove-eval-api branch. The only conflict was in ifeval_utils.py which was modified upstream but deleted in this branch as part of the eval API removal. Kept the deletion since this branch removes the eval, scoring, datasetio, and benchmarks APIs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>
These ifeval utility files were reintroduced during the merge with upstream/main. They belong to the scoring provider which was removed as part of the eval API removal in this branch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>
|
This pull request has merge conflicts that must be resolved before it can be merged. @leseb please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
Merge upstream/main into remove-eval-api branch. All conflicts were modify/delete type where upstream modified files that this branch intentionally deleted as part of removing the Eval, DatasetIO, Scoring, and Benchmarks APIs. Resolved by keeping the deletions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>
|
This pull request has merge conflicts that must be resolved before it can be merged. @leseb please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
Merge upstream/main into remove-eval-api branch, resolving conflicts in the deprecated OpenAPI spec file by keeping the branch version which excludes eval-related deprecated endpoints (scoring functions, datasets, benchmarks) consistent with the purpose of this branch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>
The pre-commit OpenAPI codegen hook updated the deprecated spec file to match the current state of the codebase after merging upstream/main. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>
Replace benchmark_id tests with vector_store_id and shield_id since benchmark_id was removed from RESOURCE_ID_FIELDS with the eval API. Remove braintrust lazy import test since the braintrust scoring provider was removed along with the eval API. Switch external provider CI test from lmeval (which depends on eval API) to the weather/kaze external provider which has no eval dependency. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>
The starter-gpu distribution was removed in PR llamastack#5279 but survived the merge into this branch. Remove the leftover directory so CI no longer attempts to build it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>
Take upstream's auto-discovery approach for router factories which dynamically discovers routers from the Api enum. Since this branch already removed eval-related APIs from the Api enum, the auto-discovery naturally excludes them without needing an explicit static list. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>
The lmeval external provider depends on the eval API which is being removed. The weather/kaze provider works but needs further validation. Disable the workflow until we find a suitable external provider for CI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>
Summary
Remove the Eval API and all connected APIs that are already marked deprecated in the spec:
/v1alpha/eval/(benchmarks, jobs, evaluations)/v1beta/datasets/(dataset CRUD)/v1alpha/scoring/(scoring functions, scoring)/v1alpha/post-training/(post training jobs)These were marked experimental/deprecated and are not part of the v1 API surface.
What changed
Migration
Users of the eval API should use external evaluation frameworks (eval-hub, RAGAS, DeepEval) directly. The datasetio provider for HuggingFace datasets can be accessed via the HuggingFace
datasetslibrary.Test plan
🤖 Generated with Claude Code