Skip to content

refactor: Decouple ModelFacade from LiteLLM via ModelClient adapter#373

Open
nabinchha wants to merge 42 commits intomainfrom
nm/overhaul-model-facade-guts-pr2
Open

refactor: Decouple ModelFacade from LiteLLM via ModelClient adapter#373
nabinchha wants to merge 42 commits intomainfrom
nm/overhaul-model-facade-guts-pr2

Conversation

@nabinchha
Copy link
Contributor

📋 Summary

Decouples ModelFacade from direct LiteLLM router usage by introducing a ModelClient adapter layer. The facade now operates entirely on canonical request/response types (ChatCompletionRequest, ChatCompletionResponse, etc.) instead of raw LiteLLM objects, making it testable without LiteLLM and preparing for future client backends.

This is PR 2 of the model facade overhaul series, building on the canonical types and LiteLLMBridgeClient introduced in PR 1.

🔄 Changes

✨ Added

  • clients/factory.pycreate_model_client() factory that handles provider resolution, API key setup, and LiteLLM router construction
  • TransportKwargs — Unified transport preparation that flattens extra_body into top-level kwargs and separates extra_headers
  • _raise_from_provider_error() in errors.py — Maps canonical ProviderError to DataDesignerError subclasses
  • extract_message_from_exception_string() for parsing human-readable messages from stringified LiteLLM exceptions
  • make_stub_completion_response() test helper for creating canonical test fixtures
  • close()/aclose() lifecycle methods on ModelFacade and ModelRegistry
  • New test file test_parsing.py for TransportKwargs behavior

🔧 Changed

  • ModelFacade — Now accepts a ModelClient via constructor injection instead of creating its own CustomRouter. All methods use canonical types (ChatCompletionRequest/Response, EmbeddingRequest/Response, ImageGenerationRequest/Response)
  • MCPFacade — Operates on canonical ChatCompletionResponse and ToolCall types instead of raw LiteLLM response objects; removed internal tool call normalization (_extract_tool_calls, _normalize_tool_call) since parsing now happens in the client layer
  • LiteLLMBridgeClient — Uses TransportKwargs.from_request() instead of collect_non_none_optional_fields() for cleaner request forwarding
  • ProviderError — Refactored from @dataclass to regular Exception subclass for proper exception semantics
  • Usage tracking — Consolidated three separate tracking methods into single _track_usage() operating on canonical Usage type
  • Test suite — All facade and MCP tests now use canonical types instead of StubResponse/StubMessage/FakeResponse/FakeMessage; tests mock ModelClient instead of CustomRouter
  • model_facade_factory — Now creates a ModelClient first, then injects it into ModelFacade

🗑️ Removed

  • _try_extract_base64() and direct image parsing from ModelFacade (moved to client layer in PR1)
  • _get_litellm_deployment() from ModelFacade (moved to create_model_client())
  • collect_non_none_optional_fields() from parsing.py (replaced by TransportKwargs)
  • Three separate usage tracking methods (_track_token_usage_from_completion, _track_token_usage_from_embedding, _track_token_usage_from_image_diffusion) replaced by unified _track_usage()
  • StubResponse/FakeResponse usage in tests (replaced by canonical types)
  • Several removed tests for internal normalization logic that is now handled by the client layer

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to the following:

  • facade.py — Core refactor: constructor signature change (client replaces secret_resolver + internal router), all methods now use canonical types
  • errors.py — New _raise_from_provider_error() mapping function and ProviderErrorException refactor
  • types.pyTransportKwargs design: flattening extra_body vs keeping extra_headers separate
  • mcp/facade.py — Significant simplification from canonical types; verify the _convert_canonical_tool_calls_to_dicts bridge is correct

🤖 Generated with Claude Code

nabinchha and others added 30 commits February 19, 2026 15:50
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…provements

- Wrap all LiteLLM router calls in try/except to normalize raw exceptions
  into canonical ProviderError at the bridge boundary (blocking review item)
- Extract reusable response-parsing helpers into clients/parsing.py for
  shared use across future native adapters
- Add async image parsing path using httpx.AsyncClient to avoid blocking
  the event loop in agenerate_image
- Add retry_after field to ProviderError for future retry engine support
- Fix _to_int_or_none to parse numeric strings from providers
- Create test conftest.py with shared mock_router/bridge_client fixtures
- Parametrize duplicate image generation and error mapping tests
- Add tests for exception wrapping across all bridge methods
…larity

- Parse RFC 7231 HTTP-date strings in Retry-After header (used by
  Azure and Anthropic during rate-limiting) in addition to numeric
  delay-seconds
- Clarify collect_non_none_optional_fields docstring explaining why
  f.default is None is the correct check for optional field forwarding
- Add tests for HTTP-date and garbage Retry-After values
- Fix misleading comment about prompt field defaults in _IMAGE_EXCLUDE
- Handle list-format detail arrays in _extract_structured_message for
  FastAPI/Pydantic validation errors
- Document scope boundary for vision content in collect_raw_image_candidates
- Replace @DataClass + __post_init__ with explicit __init__ that calls
  super().__init__ properly, avoiding brittle field-ordering dependency
- Store cause via __cause__ only, removing the redundant .cause attr
- Update match pattern in handle_llm_exceptions for non-dataclass type
- Rename shadowed local `fields` to `optional_fields` in TransportKwargs
@nabinchha nabinchha requested a review from a team as a code owner March 6, 2026 00:01
@nabinchha nabinchha changed the base branch from nm/overhaul-model-facade-guts-pr1 to main March 6, 2026 00:02
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 6, 2026

Greptile Summary

This PR cleanly decouples ModelFacade from LiteLLM by introducing a ModelClient Protocol and a LiteLLMBridgeClient adapter. The facade now works entirely with canonical request/response types (ChatCompletionRequest, EmbeddingRequest, ImageGenerationRequest and their response counterparts), making it independently testable. The refactoring is well-structured and addresses all issues raised in the previous review round.

Key changes:

  • ModelFacade: Constructor now accepts a ModelClient via dependency injection; all three operation families (completion, embeddings, image_generation) use canonical types; _track_usage() consolidates three separate tracking methods; close()/aclose() lifecycle hooks added.
  • TransportKwargs: New dataclass in clients/types.py that centralises extra_body flattening and extra_headers separation, used uniformly across all six operation methods in LiteLLMBridgeClient.
  • MCPFacade: Significantly simplified by removing _extract_tool_calls/_normalize_tool_call; the _execute_tool_calls_from_canonical replacement correctly guards against non-dict json.loads output.
  • _raise_from_provider_error: New function typed -> NoReturn maps canonical ProviderError kinds to DataDesignerError subclasses; ProviderError refactored from @dataclass to a proper Exception subclass.
  • Factory: create_model_client() extracts the deployment-construction logic from ModelFacade, keeping the facade provider-agnostic.
  • Tests: All facade and MCP tests now mock ModelClient instead of CustomRouter and use canonical types; a new test_parsing.py provides comprehensive TransportKwargs and extract_tool_calls coverage.

One area to watch: ModelRegistry.close()/aclose() iterates facades without exception isolation — if one facade's close() raises, subsequent facades will not be closed (see inline comment). This is low risk today since LiteLLMBridgeClient.close() is a no-op, but should be hardened before any future client implementation holds real resources.

Confidence Score: 4/5

  • This PR is safe to merge — it is a well-executed refactor with no new runtime risks introduced.
  • The canonical type layer is coherent, the TransportKwargs abstraction is sound, error mapping is complete and correctly typed (-> NoReturn), and all previously raised review issues have been addressed or deferred to PR3 with acknowledgment. The one mild concern — ModelRegistry.close() not isolating per-facade exceptions — is low risk today since LiteLLMBridgeClient.close() is a no-op, and the test suite has been updated comprehensively throughout.
  • Pay close attention to registry.py when adding any future ModelClient implementation that holds real resources (connections, thread pools), as the close()/aclose() loop does not isolate per-facade exceptions.

Important Files Changed

Filename Overview
packages/data-designer-engine/src/data_designer/engine/models/facade.py Core refactor: replaces direct router usage with ModelClient injection; introduces _build_chat_completion_request, _build_embedding_request, _build_image_generation_request, unified _track_usage, and new close/aclose lifecycle methods. All canonical type usages are correct. Unknown kwargs to completion are logged at debug level and routed to metadata; embedding and image kwargs are covered by explicitly extracted fields or extra_body.
packages/data-designer-engine/src/data_designer/engine/mcp/facade.py Simplified significantly by adopting canonical types; removes _extract_tool_calls/_normalize_tool_call, adds _execute_tool_calls_from_canonical with correct non-dict guard for json.loads output, and _convert_canonical_tool_calls_to_dicts bridge. The dict contract between this helper and _build_assistant_tool_message is consistent.
packages/data-designer-engine/src/data_designer/engine/models/clients/types.py New TransportKwargs dataclass cleanly centralises extra_body flattening and extra_headers separation. _collect_optional_fields correctly targets f.default is None fields; all request types exclusively use = None defaults for optional fields, so the pattern is sound. Six new fields (stop, seed, response_format, etc.) added to ChatCompletionRequest.
packages/data-designer-engine/src/data_designer/engine/models/clients/factory.py New factory file correctly extracts _get_litellm_deployment + router construction logic from ModelFacade, preserving the "not-used-but-required" API key fallback. Straightforward and well-typed.
packages/data-designer-engine/src/data_designer/engine/models/errors.py Adds ProviderError case to handle_llm_exceptions and new _raise_from_provider_error function typed -> NoReturn. parse_api_error type annotation corrected from InternalServerError to APIError. Clean and complete error mapping.
packages/data-designer-engine/src/data_designer/engine/models/clients/errors.py ProviderError refactored from @dataclass to a proper Exception subclass — correct semantics. The cause field is now self.__cause__ (Python exception chaining). New extract_message_from_exception_string helper gracefully parses human-readable messages from LiteLLM exception strings.
packages/data-designer-engine/src/data_designer/engine/models/registry.py Adds close()/aclose() lifecycle methods that iterate all managed facades. Exception safety concern: if one facade's close raises, subsequent facades are not closed. LiteLLMBridgeClient.close() is currently a no-op so this is low-risk today, but warrants attention for future client implementations.
packages/data-designer-engine/src/data_designer/engine/models/clients/adapters/litellm_bridge.py Switches from collect_non_none_optional_fields to TransportKwargs.from_request across all six operation methods; correctly uses transport.headers or None to pass None instead of {} to LiteLLM. Exception message parsing improved via extract_message_from_exception_string.
packages/data-designer-engine/tests/engine/models/clients/test_parsing.py New test file comprehensively covers TransportKwargs.from_request and extract_tool_calls. Parametrized edge cases (None/empty extra_body, missing tool IDs, None arguments) are all well-handled.
packages/data-designer-engine/tests/engine/models/test_facade.py Tests updated to mock ModelClient instead of CustomRouter, use canonical types throughout, and remove LiteLLM-specific response fixtures. Coverage is maintained; signal-to-noise ratio improved.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant ModelFacade
    participant ModelClient
    participant TransportKwargs
    participant LiteLLMBridgeClient
    participant CustomRouter

    Caller->>ModelFacade: generate(prompt, **kwargs)
    ModelFacade->>ModelFacade: consolidate_kwargs(**kwargs)
    ModelFacade->>ModelFacade: _build_chat_completion_request(messages, kwargs)
    note right of ModelFacade: known fields → ChatCompletionRequest<br/>unknown fields → metadata (debug log)
    ModelFacade->>ModelClient: completion(ChatCompletionRequest)
    ModelClient->>TransportKwargs: from_request(request)
    note right of TransportKwargs: extra_body flattened into body<br/>extra_headers separated
    ModelClient->>LiteLLMBridgeClient: router.completion(model, messages, extra_headers, **body)
    LiteLLMBridgeClient->>CustomRouter: completion(...)
    CustomRouter-->>LiteLLMBridgeClient: litellm.ModelResponse
    LiteLLMBridgeClient->>LiteLLMBridgeClient: parse_chat_completion_response(response)
    LiteLLMBridgeClient-->>ModelClient: ChatCompletionResponse
    ModelClient-->>ModelFacade: ChatCompletionResponse
    ModelFacade->>ModelFacade: _track_usage(response.usage)
    ModelFacade-->>Caller: (parsed_output, messages)
Loading

Last reviewed commit: a084038

@nabinchha nabinchha self-assigned this Mar 6, 2026
Comment on lines +203 to +211
def _set_model_configs(self, model_configs: list[ModelConfig] | None) -> None:
self._model_configs = {mc.alias: mc for mc in (model_configs or [])}

def close(self) -> None:
"""Release resources held by all model facades."""
for facade in self._models.values():
facade.close()

async def aclose(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

close()/aclose() silently skip remaining facades on error

If any facade's close() (or aclose()) raises an exception, the loop terminates early and all subsequent facades are left unclosed — a resource leak at cleanup time. While LiteLLMBridgeClient.close() is currently a no-op, future client implementations may hold real connections (HTTP pools, sockets, etc.).

Consider using a collect-and-reraise pattern so every facade is given a chance to close:

def close(self) -> None:
    """Release resources held by all model facades."""
    errors = []
    for facade in self._models.values():
        try:
            facade.close()
        except Exception as e:
            errors.append(e)
    if errors:
        raise ExceptionGroup("Errors during ModelRegistry.close()", errors)

The same pattern should be applied to aclose().

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/data-designer-engine/src/data_designer/engine/models/registry.py
Line: 203-211

Comment:
**`close()`/`aclose()` silently skip remaining facades on error**

If any facade's `close()` (or `aclose()`) raises an exception, the loop terminates early and all subsequent facades are left unclosed — a resource leak at cleanup time. While `LiteLLMBridgeClient.close()` is currently a no-op, future client implementations may hold real connections (HTTP pools, sockets, etc.).

Consider using a collect-and-reraise pattern so every facade is given a chance to close:

```python
def close(self) -> None:
    """Release resources held by all model facades."""
    errors = []
    for facade in self._models.values():
        try:
            facade.close()
        except Exception as e:
            errors.append(e)
    if errors:
        raise ExceptionGroup("Errors during ModelRegistry.close()", errors)
```

The same pattern should be applied to `aclose()`.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants