Skip to content

[v0.7 CLI Refactor] Rework BackendArgs to be the authoritative config location #723

Merged
sjmonson merged 8 commits into
mainfrom
refactor/schema/backend
May 11, 2026
Merged

[v0.7 CLI Refactor] Rework BackendArgs to be the authoritative config location #723
sjmonson merged 8 commits into
mainfrom
refactor/schema/backend

Conversation

@sjmonson
Copy link
Copy Markdown
Collaborator

@sjmonson sjmonson commented May 7, 2026

Summary

Reworks BackendArgs into a PydanticClassRegistry containing everything needed to create a backend.

Details

This is the first (or third depending on how you count) patch to refactor our CLI and the internal way submodules are configured and spawned. Since we already had a partial base for it this will likely be the simplest PR and subsequent PRs will have to rework a lot more code.

Changes to the entrypoints and CLI are temporary as that code will be refactored in a follow-up.

Changes

I tried to keep most of the original functionality but there were a few things that just did not make sense as implemented or were planned for removal:

Dropped LEGACY_API_ALIASES

They only exist to keep existing scripts from breaking and this release those scripts will break for other reasons so might as well make the change now.

Dropped request_handlers backend argument

This let the user provide custom request type handlers as a backend argument. The better way to do this is to just call OpenAIRequestHandlerFactory.register on your custom handler.

validate_backend no longer has the option to provide a custom health check endpoint

You can get nearly the same functionality by passing api_routes={"/health": "custom endpoint"}. However users usually just set validate_backend=False since its too much of a hassle to provide a custom endpoint.

Test Plan

Test setting --backend and --backend-kwargs with various values to make sure nothing has changed (except for exceptions detailed below).

Related Issues


  • "I certify that all code in this PR is my own, except as noted below."

Use of AI

  • Includes AI-assisted code completion
  • Includes code generated by an AI application
  • Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

sjmonson added 4 commits May 7, 2026 15:04
Signed-off-by: Samuel Monson <smonson@redhat.com>
Assisted-by: Copilot <GPT-4.1>
Signed-off-by: Samuel Monson <smonson@redhat.com>
Assisted-by: Copilot <GPT-4.1>
Signed-off-by: Samuel Monson <smonson@redhat.com>
Assisted-by: Copilot <GPT-4.1>
Signed-off-by: Samuel Monson <smonson@redhat.com>
Generated-by: claude-code <Sonnet 4.6>
@sjmonson sjmonson force-pushed the refactor/schema/backend branch from 567812b to cc2a59f Compare May 7, 2026 19:04
@sjmonson sjmonson marked this pull request as ready for review May 7, 2026 19:04
Signed-off-by: Samuel Monson <smonson@redhat.com>
@sjmonson
Copy link
Copy Markdown
Collaborator Author

sjmonson commented May 7, 2026

sigh Fix one CI job and break another. I will have a fix up for the tests soon.

Signed-off-by: Samuel Monson <smonson@redhat.com>
Generated-by: claude-code <Sonnet 4.6>
@sjmonson
Copy link
Copy Markdown
Collaborator Author

sjmonson commented May 8, 2026

augment review

@augmentcode
Copy link
Copy Markdown

augmentcode Bot commented May 8, 2026

🤖 Augment PR Summary

Summary: This PR continues the v0.7 CLI/internal refactor by making BackendArgs the authoritative, typed configuration object used to construct backends.

Changes:

  • Refactors BackendArgs into a polymorphic PydanticClassRegistryMixin model using a type discriminator.
  • Updates Backend.create() to accept a BackendArgs instance (instead of type + kwargs) and adjusts Backend initialization accordingly.
  • Migrates OpenAIHTTPBackend to consume a single OpenAIHTTPBackendArgs model; adds/normalizes fields like api_routes, SecretStr api_key, timeouts, and drops legacy aliases/custom handler kwargs.
  • Migrates VLLMPythonBackend to consume VLLMPythonBackendArgs, consolidating engine configuration into vllm_config and formalizing request/template and placeholder options.
  • Updates benchmark entrypoints to pass typed backend args through the stack; removes the separate backend field from generative benchmark schemas.
  • Adjusts the benchmark CLI to inject --backend into backend_kwargs["type"] and updates unit tests for the new validation/serialization behavior.

Technical Notes: Polymorphic backend args are now validated via the registry/tagged-union mechanism, and backend configuration is intended to be transported/serialized as a single object for spawning and submodule configuration.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 3 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

Comment thread src/guidellm/backends/backend.py
Comment thread src/guidellm/backends/openai/http.py
Comment thread src/guidellm/cli/benchmark/run.py
Signed-off-by: Samuel Monson <smonson@redhat.com>
dbutenhof
dbutenhof previously approved these changes May 8, 2026
Copy link
Copy Markdown
Collaborator

@dbutenhof dbutenhof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pulled it and tried various good and bad option values e.g. for the backend args, and all seems to work, which is one definition of "good".

Comment thread src/guidellm/backends/backend.py
Comment thread src/guidellm/backends/openai/http.py
Signed-off-by: Samuel Monson <smonson@redhat.com>
Generated-by: claude-code Sonnet 4.6
@sjmonson
Copy link
Copy Markdown
Collaborator Author

sjmonson commented May 8, 2026

Forgot I was going to use this chance to move GenerationRequestArguments to the OpenAI backend. Will add that next week.

Copy link
Copy Markdown
Collaborator

@jaredoconnell jaredoconnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I tested basic use cases with the HTTP and vLLM-Python backend.

Comment thread src/guidellm/backends/vllm_python/vllm.py
Comment thread src/guidellm/backends/openai/http.py
Comment thread tests/unit/backends/test_backend.py
@jaredoconnell
Copy link
Copy Markdown
Collaborator

This looks ready to merge. I have no further comments after seeing your responses.

@sjmonson
Copy link
Copy Markdown
Collaborator Author

Forgot I was going to use this chance to move GenerationRequestArguments to the OpenAI backend. Will add that next week.

Changed my mind about this. I'll do it in a follow-up after #713 lands since that PR adds a common file which will make it easier to move.

@sjmonson sjmonson merged commit 0a19377 into main May 11, 2026
11 checks passed
@sjmonson sjmonson deleted the refactor/schema/backend branch May 11, 2026 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rework interface between backends and benchmark

3 participants