Skip to content

refactor!: remove deprecated Eval API and related APIs#5290

Merged
leseb merged 29 commits intollamastack:mainfrom
leseb:remove-eval-api
Apr 7, 2026
Merged

refactor!: remove deprecated Eval API and related APIs#5290
leseb merged 29 commits intollamastack:mainfrom
leseb:remove-eval-api

Conversation

@leseb
Copy link
Copy Markdown
Collaborator

@leseb leseb commented Mar 25, 2026

Summary

Remove the Eval API and all connected APIs that are already marked deprecated in the spec:

  • /v1alpha/eval/ (benchmarks, jobs, evaluations)
  • /v1beta/datasets/ (dataset CRUD)
  • /v1alpha/scoring/ (scoring functions, scoring)

These were marked experimental/deprecated and are not part of the v1 API surface.

What changed

  • Removed provider registry entries for eval, scoring, post_training, datasetio
  • Removed inline/remote provider implementations
  • Removed routing tables and routers
  • Removed API models, protocols, and FastAPI routes from llama_stack_api
  • Removed distribution config entries
  • Updated tests: replaced benchmark_id tests with vector_store_id/shield_id, removed braintrust lazy import test
  • Removed starter-gpu distribution (leftover from merge)
  • Disabled external provider CI test (lmeval depends on eval API - switched to workflow_dispatch only until a suitable replacement is found)

Migration

Users of the eval API should use external evaluation frameworks (eval-hub, RAGAS, DeepEval) directly. The datasetio provider for HuggingFace datasets can be accessed via the HuggingFace datasets library.

Test plan

  • Unit tests pass (replaced benchmark tests with vector_store/shield tests)
  • Pre-commit passes
  • starter-gpu distribution removed
  • External provider CI test disabled (was using lmeval)

🤖 Generated with Claude Code

…o, scoring, scoring_functions, benchmarks)

Remove the Eval API and all connected APIs that were already marked as
deprecated in the spec. This includes the datasets, datasetio, scoring,
scoring_functions, and benchmarks APIs along with all their provider
implementations, routing tables, routers, registry entries, distribution
configs, and tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 25, 2026
@leseb leseb added this to the 0.8.0 milestone Mar 25, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 25, 2026

✱ Stainless preview builds

This PR will update the llama-stack-client SDKs with the following commit message.

refactor!: remove deprecated Eval API and related APIs
⚠️ llama-stack-client-openapi studio · code

Your SDK build had at least one "warning" diagnostic.
generate ⚠️

⚠️ llama-stack-client-python studio · conflict

Your SDK build had at least one warning diagnostic.

⚠️ llama-stack-client-go studio · conflict

Your SDK build had at least one error diagnostic.

⚠️ llama-stack-client-node studio · conflict

Your SDK build had at least one warning diagnostic.


This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
If you push custom code to the preview branch, re-run this workflow to update the comment.
Last updated: 2026-04-07 15:53:37 UTC

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 25, 2026

This pull request has merge conflicts that must be resolved before it can be merged. @leseb please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Sébastien Han <seb@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 26, 2026

This pull request has merge conflicts that must be resolved before it can be merged. @leseb please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 26, 2026
…nses rename

Merge upstream/main into remove-eval-api branch, resolving conflicts caused
by the agents-to-responses API rename (PR llamastack#5195) interacting with the eval API
removal. This includes fixing distribution templates that still referenced the
old "agents" API name, removing the stale DatasetPurpose import from
template.py, cleaning up the stainless config to remove references to deleted
eval/scoring/dataset endpoints, removing the now-orphaned nvidia datasetio
test, and regenerating all OpenAPI specs and distribution configs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
@mergify mergify bot removed the needs-rebase label Mar 26, 2026
leseb and others added 5 commits March 26, 2026 21:05
Resolve merge conflicts from upstream/main. All modify/delete conflicts
are resolved by keeping the deletions from this branch since the purpose
of this branch is to remove the eval, scoring, datasetio, and benchmarks
APIs. Content conflicts in datatypes.py files are resolved by excluding
the eval-related classes that upstream added.

Signed-off-by: Sébastien Han <seb@redhat.com>
Merge upstream/main into remove-eval-api branch. The Dell distribution
was removed upstream, so the conflicting Dell files (config.yaml,
dell.py, run-with-safety.yaml) are resolved by accepting the upstream
deletion.

Signed-off-by: Sebastien Han <seb@redhat.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
Merge upstream/main into remove-eval-api branch. The only conflict was
in ifeval_utils.py which was modified upstream but deleted in this branch
as part of the eval API removal. Kept the deletion since this branch
removes the eval, scoring, datasetio, and benchmarks APIs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
These ifeval utility files were reintroduced during the merge with
upstream/main. They belong to the scoring provider which was removed
as part of the eval API removal in this branch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 30, 2026

This pull request has merge conflicts that must be resolved before it can be merged. @leseb please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 30, 2026
Merge upstream/main into remove-eval-api branch. All conflicts were
modify/delete type where upstream modified files that this branch
intentionally deleted as part of removing the Eval, DatasetIO, Scoring,
and Benchmarks APIs. Resolved by keeping the deletions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
@mergify mergify bot removed the needs-rebase label Mar 30, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 30, 2026

This pull request has merge conflicts that must be resolved before it can be merged. @leseb please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 30, 2026
leseb and others added 2 commits March 30, 2026 16:46
Merge upstream/main into remove-eval-api branch, resolving conflicts in
the deprecated OpenAPI spec file by keeping the branch version which
excludes eval-related deprecated endpoints (scoring functions, datasets,
benchmarks) consistent with the purpose of this branch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
The pre-commit OpenAPI codegen hook updated the deprecated spec file to
match the current state of the codebase after merging upstream/main.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
@mergify mergify bot removed the needs-rebase label Mar 30, 2026
leseb and others added 3 commits March 31, 2026 11:44
Replace benchmark_id tests with vector_store_id and shield_id since
benchmark_id was removed from RESOURCE_ID_FIELDS with the eval API.

Remove braintrust lazy import test since the braintrust scoring provider
was removed along with the eval API.

Switch external provider CI test from lmeval (which depends on eval API)
to the weather/kaze external provider which has no eval dependency.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
nathan-weinberg and others added 3 commits April 2, 2026 17:56
)

# What does this PR do?
Addresses  DoS due to too many multipart parts

Refs:
[GHSA-qjxf-f2mg-c6mc](GHSA-qjxf-f2mg-c6mc)

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
…ack#5423)

## Summary

Major documentation cleanup and modernization across ~35 files, net -970
lines.

### New features
- **`[starter]` pip extra** in pyproject.toml enabling `uvx --from
'llama-stack[starter]' llama stack run starter`
- **InstallBlock component** on landing page with Ollama/OpenAI tabs,
syntax highlighting, and copy button
- **Lucide-style sidebar icons** for provider sub-categories (Inference,
Safety, Vector IO, etc.)

### Content updates
- Rewrite quickstart: `uvx` for experimenting, `uv init` for projects
- Rewrite detailed tutorial: all examples use standard OpenAI SDK
- Update all `agents` references to `responses` across distro configs
and provider tables
- Simplify concepts: architecture matches ARCHITECTURE.md, APIs page
focuses on integration patterns
- Rewrite Responses vs Agents page: Responses is primary, Agents is
legacy
- Clean up provider overview page
- Fix broken tables, dead links, Sphinx toctree remnants

### Removed
- iOS/Android SDK pages (untested, unmaintained)
- Post Training provider docs (API removed)
- On-Device Distributions sidebar section
- Duplicate content across quickstart/tutorial/starting-server pages

## Test plan

- [ ] `cd docs && npm install && npx docusaurus build` passes
- [ ] Light and dark modes work
- [ ] All sidebar links resolve
- [ ] InstallBlock copy button works
- [ ] API overview card links point to correct generated pages

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove eval/scoring categories from log.py
- Remove stale entries from routing_tables, providers, registry READMEs
- Fix open-benchmark docstring referencing removed APIs
- Remove evaluation.md and scoring.md docs (APIs no longer exist)
- Remove scoring sidebar entries
- Regenerate OpenAPI specs to clean up empty tags

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
Copy link
Copy Markdown
Contributor

@skamenan7 skamenan7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Seb. LGTM me expect two comments.

Responses,
Safety,
Scoring,
ScoringFunctions,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing worth catching before merge: anyone upgrading with eval/scoring/datasetio still in their run config is going to hit a confusing KeyError instead of a helpful "this API was removed" message. would a guard in configure.py or resolver.py make sense — something that checks for these obsolete API values early and surfaces a clean migration note?



class DatasetWithOwner(Dataset, ResourceWithOwner):
"""A Dataset resource extended with ownership information for access control."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel there is a potential in-place upgrade issue of silent data loss path. existing registry entries with type=dataset|scoring_function|benchmark will silently fail to deserialize because RoutableObjectWithProvider no longer includes those types. Would it make sense to add a migration note to the release, or handle unrecognized discriminator values gracefully (log and skip rather than crash)?

@leseb leseb added this pull request to the merge queue Apr 7, 2026
@leseb
Copy link
Copy Markdown
Collaborator Author

leseb commented Apr 7, 2026

@skamenan7 thanks will address the comments separately!

@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 7, 2026
@cdoern cdoern enabled auto-merge April 7, 2026 14:20
@cdoern cdoern added this pull request to the merge queue Apr 7, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 7, 2026
@leseb leseb added this pull request to the merge queue Apr 7, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 7, 2026
@leseb leseb added this pull request to the merge queue Apr 7, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 7, 2026
leseb and others added 2 commits April 7, 2026 17:20
…ompat checks

The backward-compat workflow uses github.event.pull_request.title to
check for breaking change acknowledgment, but this context is empty
during merge_group events. This caused the checks to always report
breaking changes as unacknowledged even when the PR title contained
the conventional commit '!:' marker.

Add a fallback that fetches the PR title via gh CLI when the event
context is empty, resolving it from the head commit SHA or branch name.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
## Summary

- Redesign landing page with animated mesh gradient hero, infinite
scrolling provider marquee, stats ribbon, and icon-enhanced endpoint
cards with gradient border glow
- Add missing API endpoint cards: Moderations, Messages, Conversations,
Connectors, Batches (11 total)
- Improve navbar with gradient accent line, styled hover effects, and
pill-styled GitHub button
- Enhance footer with gradient separator, four-column layout, and accent
bars under titles
- Add light/dark mode support for code blocks across doc and API pages
with proper syntax highlighting
- Style OpenAPI explorer panels (request, response, code samples) to
match site theme in both modes
- Compact API listing page cards

## Test plan

- [x] Verify landing page renders correctly in both light and dark mode
- [x] Verify doc pages code blocks use light theme in light mode
- [x] Verify API pages (OpenAPI explorer) panels are readable in both
modes
- [x] Verify navbar, footer, and responsive breakpoints work across
screen sizes
- [x] Pre-commit hooks pass

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@leseb leseb enabled auto-merge April 7, 2026 15:22
@leseb leseb added this pull request to the merge queue Apr 7, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 7, 2026
…olution

The previous fallback used gh pr list with SHA/branch search, which
fails for fork PRs in merge_group context. Instead, extract the PR
number directly from the merge queue branch name format
(gh-readonly-queue/main/pr-<NUMBER>-<SHA>) and use gh pr view to
fetch the title reliably.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
@leseb leseb enabled auto-merge April 7, 2026 15:44
@leseb leseb added this pull request to the merge queue Apr 7, 2026
Merged via the queue into llamastack:main with commit 418318f Apr 7, 2026
98 checks passed
@leseb leseb deleted the remove-eval-api branch April 7, 2026 15:52
skamenan7 added a commit to skamenan7/llama-stack that referenced this pull request Apr 13, 2026
Remove the Eval API section from nvidia doc_template, and update
docstrings in nvidia.py and open_benchmark.py to no longer reference
evaluation capabilities. The eval API configs were already removed
upstream in PR llamastack#5290.

Signed-off-by: Sumanth Kamenani <skamenan@redhat.com>
Signed-off-by: skamenan7 <skamenan@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants