Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
226 changes: 113 additions & 113 deletions docs/docs/api-openai/provider_matrix.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ inference provider, based on integration test results.
|----------|--------|---------|---------|----------|
| azure | 101 | 101 | 0 | 86% |
| openai | 118 | 118 | 0 | 100% |
| vllm | 1 | 1 | 0 | 1% |
| vllm | 98 | 98 | 0 | 83% |
| watsonx | 56 | 56 | 0 | 48% |

## Provider Details
Expand All @@ -34,112 +34,112 @@ Models, endpoints, and versions used during test recordings.
|----------|----------|----------|--------------|
| azure | gpt-4o | llama-stack-test.openai.azure.com, lls-test.openai.azure.com | openai sdk: 2.5.0 |
| openai | gpt-4o, o4-mini, text-embedding-3-small | api.openai.com | openai sdk: 2.5.0 |
| vllm | Qwen/Qwen3-0.6B | — | |
| vllm | Qwen/Qwen3-0.6B, Qwen/Qwen3.5-35B-A3B | — | openai sdk: 2.5.0, vllm server: 0.17.1 |
| watsonx | meta-llama/llama-3-3-70b-instruct | us-south.ml.cloud.ibm.com | openai sdk: 2.5.0 |

## Basic Responses

| Feature | azure | openai | vllm | watsonx |
| --- | --- | --- | --- | --- |
| extra body guided choice | ✅ | ✅ | | ✅ |
| include logprobs non streaming | ✅ | ✅ | | ✅ |
| include logprobs streaming | ✅ | ✅ | | ✅ |
| include logprobs with function tools | ✅ | ✅ | | ✅ |
| include logprobs with web search | ✅ | ✅ | | ✅ |
| non streaming basic | ✅ | ✅ | | ✅ |
| non streaming image | ✅ | ✅ | | ⏭️ |
| non streaming multi turn | ✅ | ✅ | | ✅ |
| non streaming multi turn image | ✅ | ✅ | | ⏭️ |
| streaming basic | ✅ | ✅ | | ✅ |
| streaming incremental content | ✅ | ✅ | | ✅ |
| extra body guided choice | ✅ | ✅ | | ✅ |
| include logprobs non streaming | ✅ | ✅ | | ✅ |
| include logprobs streaming | ✅ | ✅ | | ✅ |
| include logprobs with function tools | ✅ | ✅ | | ✅ |
| include logprobs with web search | ✅ | ✅ | | ✅ |
| non streaming basic | ✅ | ✅ | | ✅ |
| non streaming image | ✅ | ✅ | | ⏭️ |
| non streaming multi turn | ✅ | ✅ | | ✅ |
| non streaming multi turn image | ✅ | ✅ | | ⏭️ |
| streaming basic | ✅ | ✅ | | ✅ |
| streaming incremental content | ✅ | ✅ | | ✅ |

## Conversation Responses

| Feature | azure | openai | vllm | watsonx |
| --- | --- | --- | --- | --- |
| conversation backward compatibility | ✅ | ✅ | | ✅ |
| conversation basic workflow | ✅ | ✅ | | ✅ |
| conversation context loading | ✅ | ✅ | | ✅ |
| conversation error handling | ⏭️ | ✅ | | ⏭️ |
| conversation multi turn and streaming | ✅ | ✅ | | ✅ |
| conversation backward compatibility | ✅ | ✅ | | ✅ |
| conversation basic workflow | ✅ | ✅ | | ✅ |
| conversation context loading | ✅ | ✅ | | ✅ |
| conversation error handling | ⏭️ | ✅ | ⏭️ | ⏭️ |
| conversation multi turn and streaming | ✅ | ✅ | | ✅ |

## File Search

| Feature | azure | openai | vllm | watsonx |
| --- | --- | --- | --- | --- |
| file search filter by category | ✅ | ✅ | | ⏭️ |
| file search filter by date range | ✅ | ✅ | | ⏭️ |
| file search filter by region | ✅ | ✅ | | ⏭️ |
| file search filter compound and | ✅ | ✅ | | ⏭️ |
| file search filter compound or | ✅ | ✅ | | ⏭️ |
| file search streaming events | ✅ | ✅ | | ⏭️ |
| text format | ✅ | ✅ | | ✅ |
| file search filter by category | ✅ | ✅ | | ⏭️ |
| file search filter by date range | ✅ | ✅ | | ⏭️ |
| file search filter by region | ✅ | ✅ | | ⏭️ |
| file search filter compound and | ✅ | ✅ | | ⏭️ |
| file search filter compound or | ✅ | ✅ | | ⏭️ |
| file search streaming events | ✅ | ✅ | | ⏭️ |
| text format | ✅ | ✅ | | ✅ |

## Mcp Authentication

| Feature | azure | openai | vllm | watsonx |
| --- | --- | --- | --- | --- |
| mcp authorization backward compatibility | ✅ | ✅ | | — |
| mcp authorization bearer | ✅ | ✅ | | — |
| mcp authorization backward compatibility | ✅ | ✅ | | — |
| mcp authorization bearer | ✅ | ✅ | | — |

## Openai Responses

| Feature | azure | openai | vllm | watsonx |
| --- | --- | --- | --- | --- |
| background false is synchronous | ✅ | ✅ | | ✅ |
| background returns queued | ✅ | ✅ | | ✅ |
| incomplete details length | ✅ | ✅ | | ✅ |
| incomplete details length streaming | ✅ | ✅ | | ✅ |
| incomplete details max iterations exceeded | ✅ | ✅ | | ✅ |
| incomplete details max iterations exceeded streaming | ✅ | ✅ | | ✅ |
| incomplete details null when completed | ✅ | ✅ | | ✅ |
| reasoning effort | ⏭️ | ✅ | | ⏭️ |
| reasoning effort streaming | ⏭️ | ✅ | | ⏭️ |
| streaming includes usage | ✅ | ✅ | | ✅ |
| streaming invalid base64 image failure code is spec compliant | ✅ | ✅ | | ⏭️ |
| with max output tokens | ✅ | ✅ | | ⏭️ |
| with parallel tool calls and previous response | ✅ | ✅ | | ✅ |
| with parallel tool calls disabled | ✅ | ✅ | | ⏭️ |
| with parallel tool calls disabled streaming | ✅ | ✅ | | ⏭️ |
| with parallel tool calls enabled | ✅ | ✅ | | ⏭️ |
| with prompt cache key | ✅ | ✅ | | ✅ |
| with prompt cache key and previous response | ✅ | ✅ | | ✅ |
| with prompt cache key streaming | ✅ | ✅ | | ✅ |
| with safety identifier | ✅ | ✅ | | ✅ |
| with safety identifier and previous response | ✅ | ✅ | | ✅ |
| with safety identifier streaming | ✅ | ✅ | | ✅ |
| with service tier | ⏭️ | ✅ | | ⏭️ |
| with service tier and previous response | ⏭️ | ✅ | | ⏭️ |
| with service tier auto | ⏭️ | ✅ | | ⏭️ |
| with service tier auto and previous response | ⏭️ | ✅ | | ⏭️ |
| with service tier auto streaming | ⏭️ | ✅ | | ⏭️ |
| with service tier flex | ⏭️ | ✅ | | ⏭️ |
| with service tier flex streaming | ⏭️ | ✅ | | ⏭️ |
| with service tier streaming | ⏭️ | ✅ | | ⏭️ |
| with small max output tokens | ✅ | ✅ | | ⏭️ |
| with stream options and previous response | ✅ | ✅ | | ✅ |
| with stream options includes usage | ✅ | ✅ | | ✅ |
| with stream options non streaming | ✅ | ✅ | | ✅ |
| with top logprobs | ✅ | ✅ | | ✅ |
| with top logprobs and previous response | ✅ | ✅ | | ✅ |
| with top logprobs streaming | ✅ | ✅ | | ✅ |
| with top p | ✅ | ✅ | | ✅ |
| with top p and previous response | ✅ | ✅ | | ✅ |
| with top p streaming | ✅ | ✅ | | ✅ |
| with truncation and previous response | ✅ | ✅ | | ✅ |
| with truncation disabled | ✅ | ✅ | | ✅ |
| with truncation disabled streaming | ✅ | ✅ | | ✅ |
| background false is synchronous | ✅ | ✅ | | ✅ |
| background returns queued | ✅ | ✅ | | ✅ |
| incomplete details length | ✅ | ✅ | | ✅ |
| incomplete details length streaming | ✅ | ✅ | | ✅ |
| incomplete details max iterations exceeded | ✅ | ✅ | | ✅ |
| incomplete details max iterations exceeded streaming | ✅ | ✅ | | ✅ |
| incomplete details null when completed | ✅ | ✅ | | ✅ |
| reasoning effort | ⏭️ | ✅ | ⏭️ | ⏭️ |
| reasoning effort streaming | ⏭️ | ✅ | ⏭️ | ⏭️ |
| streaming includes usage | ✅ | ✅ | | ✅ |
| streaming invalid base64 image failure code is spec compliant | ✅ | ✅ | | ⏭️ |
| with max output tokens | ✅ | ✅ | | ⏭️ |
| with parallel tool calls and previous response | ✅ | ✅ | | ✅ |
| with parallel tool calls disabled | ✅ | ✅ | | ⏭️ |
| with parallel tool calls disabled streaming | ✅ | ✅ | | ⏭️ |
| with parallel tool calls enabled | ✅ | ✅ | | ⏭️ |
| with prompt cache key | ✅ | ✅ | | ✅ |
| with prompt cache key and previous response | ✅ | ✅ | | ✅ |
| with prompt cache key streaming | ✅ | ✅ | | ✅ |
| with safety identifier | ✅ | ✅ | | ✅ |
| with safety identifier and previous response | ✅ | ✅ | | ✅ |
| with safety identifier streaming | ✅ | ✅ | | ✅ |
| with service tier | ⏭️ | ✅ | ⏭️ | ⏭️ |
| with service tier and previous response | ⏭️ | ✅ | ⏭️ | ⏭️ |
| with service tier auto | ⏭️ | ✅ | ⏭️ | ⏭️ |
| with service tier auto and previous response | ⏭️ | ✅ | ⏭️ | ⏭️ |
| with service tier auto streaming | ⏭️ | ✅ | ⏭️ | ⏭️ |
| with service tier flex | ⏭️ | ✅ | ⏭️ | ⏭️ |
| with service tier flex streaming | ⏭️ | ✅ | ⏭️ | ⏭️ |
| with service tier streaming | ⏭️ | ✅ | ⏭️ | ⏭️ |
| with small max output tokens | ✅ | ✅ | | ⏭️ |
| with stream options and previous response | ✅ | ✅ | | ✅ |
| with stream options includes usage | ✅ | ✅ | | ✅ |
| with stream options non streaming | ✅ | ✅ | | ✅ |
| with top logprobs | ✅ | ✅ | | ✅ |
| with top logprobs and previous response | ✅ | ✅ | | ✅ |
| with top logprobs streaming | ✅ | ✅ | | ✅ |
| with top p | ✅ | ✅ | | ✅ |
| with top p and previous response | ✅ | ✅ | | ✅ |
| with top p streaming | ✅ | ✅ | | ✅ |
| with truncation and previous response | ✅ | ✅ | | ✅ |
| with truncation disabled | ✅ | ✅ | | ✅ |
| with truncation disabled streaming | ✅ | ✅ | | ✅ |

## Prompt Templates

| Feature | azure | openai | vllm | watsonx |
| --- | --- | --- | --- | --- |
| basic prompt template | ✅ | ✅ | | ✅ |
| multi variable prompt template | ✅ | ✅ | | ✅ |
| multi version prompt template | ✅ | ✅ | | ✅ |
| prompt template no variables | ✅ | ✅ | | ✅ |
| prompt template with multi turn | ✅ | ✅ | | ✅ |
| prompt template with streaming | ✅ | ✅ | | ✅ |
| basic prompt template | ✅ | ✅ | | ✅ |
| multi variable prompt template | ✅ | ✅ | | ✅ |
| multi version prompt template | ✅ | ✅ | | ✅ |
| prompt template no variables | ✅ | ✅ | | ✅ |
| prompt template with multi turn | ✅ | ✅ | | ✅ |
| prompt template with streaming | ✅ | ✅ | | ✅ |

## Reasoning

Expand All @@ -162,53 +162,53 @@ Models, endpoints, and versions used during test recordings.

| Feature | azure | openai | vllm | watsonx |
| --- | --- | --- | --- | --- |
| completed response has no error | ✅ | ✅ | | ✅ |
| invalid base64 image returns image error | ✅ | ✅ | | ⏭️ |
| invalid image url returns image error | ✅ | ✅ | | ⏭️ |
| non vision model returns error for image input | ✅ | ✅ | | ✅ |
| non vision model with base64 image returns server error | ✅ | ✅ | | ✅ |
| completed response has no error | ✅ | ✅ | | ✅ |
| invalid base64 image returns image error | ✅ | ✅ | ⏭️ | ⏭️ |
| invalid image url returns image error | ✅ | ✅ | ⏭️ | ⏭️ |
| non vision model returns error for image input | ✅ | ✅ | | ✅ |
| non vision model with base64 image returns server error | ✅ | ✅ | | ✅ |

## Structured Output

| Feature | azure | openai | vllm | watsonx |
| --- | --- | --- | --- | --- |
| json schema array of integers | ✅ | ✅ | | ✅ |
| json schema array of objects | ✅ | ✅ | | ✅ |
| json schema array of strings | ✅ | ✅ | | ⏭️ |
| json schema boolean types | ✅ | ✅ | | ⏭️ |
| json schema float types | ✅ | ✅ | | ⏭️ |
| json schema integer types | ✅ | ✅ | | ⏭️ |
| json schema mixed types structures | ✅ | ✅ | | ✅ |
| json schema nested objects | ✅ | ✅ | | ✅ |
| json schema string types | ✅ | ✅ | | ✅ |
| json schema array of integers | ✅ | ✅ | | ✅ |
| json schema array of objects | ✅ | ✅ | | ✅ |
| json schema array of strings | ✅ | ✅ | | ⏭️ |
| json schema boolean types | ✅ | ✅ | | ⏭️ |
| json schema float types | ✅ | ✅ | | ⏭️ |
| json schema integer types | ✅ | ✅ | | ⏭️ |
| json schema mixed types structures | ✅ | ✅ | | ✅ |
| json schema nested objects | ✅ | ✅ | | ✅ |
| json schema string types | ✅ | ✅ | | ✅ |

## Tool Responses

| Feature | azure | openai | vllm | watsonx |
| --- | --- | --- | --- | --- |
| connector resolution mcp tool | ✅ | ✅ | | — |
| function call ordering 1 | ✅ | ✅ | | — |
| function call ordering 2 | ✅ | ✅ | | — |
| function call output list file | ✅ | ✅ | | — |
| function call output list image | ✅ | ✅ | | — |
| function call output list text | ✅ | ✅ | | — |
| function call output list text multi block | ✅ | ✅ | | — |
| max tool calls with function tools | ✅ | ✅ | | — |
| max tool calls with mcp tools | ✅ | ✅ | | — |
| mcp tool approval | ✅ | ✅ | | — |
| multi turn streaming web search | ✅ | ✅ | | — |
| non streaming custom tool | ✅ | ✅ | | — |
| non streaming file search | ✅ | ✅ | | — |
| non streaming file search empty vector store | ✅ | ✅ | | — |
| non streaming mcp tool | ✅ | ✅ | | — |
| non streaming multi turn tool execution | ✅ | ✅ | | — |
| non streaming web search | ✅ | ✅ | | — |
| parallel tool calls with function tools | ✅ | ✅ | | — |
| parallel tool calls with mcp tools | ✅ | ✅ | | — |
| sequential file search | ✅ | ✅ | | — |
| sequential mcp tool | ✅ | ✅ | | — |
| streaming multi turn tool execution | ✅ | ✅ | | — |
| streaming web search | ✅ | ✅ | | — |
| connector resolution mcp tool | ✅ | ✅ | | — |
| function call ordering 1 | ✅ | ✅ | | — |
| function call ordering 2 | ✅ | ✅ | | — |
| function call output list file | ✅ | ✅ | | — |
| function call output list image | ✅ | ✅ | ⏭️ | — |
| function call output list text | ✅ | ✅ | | — |
| function call output list text multi block | ✅ | ✅ | | — |
| max tool calls with function tools | ✅ | ✅ | | — |
| max tool calls with mcp tools | ✅ | ✅ | | — |
| mcp tool approval | ✅ | ✅ | | — |
| multi turn streaming web search | ✅ | ✅ | | — |
| non streaming custom tool | ✅ | ✅ | | — |
| non streaming file search | ✅ | ✅ | | — |
| non streaming file search empty vector store | ✅ | ✅ | | — |
| non streaming mcp tool | ✅ | ✅ | | — |
| non streaming multi turn tool execution | ✅ | ✅ | | — |
| non streaming web search | ✅ | ✅ | | — |
| parallel tool calls with function tools | ✅ | ✅ | | — |
| parallel tool calls with mcp tools | ✅ | ✅ | | — |
| sequential file search | ✅ | ✅ | | — |
| sequential mcp tool | ✅ | ✅ | | — |
| streaming multi turn tool execution | ✅ | ✅ | | — |
| streaming web search | ✅ | ✅ | | — |

---

Expand Down
2 changes: 2 additions & 0 deletions src/llama_stack/testing/api_recorder.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,7 +191,9 @@ def normalize_inference_request(method: str, url: str, headers: dict[str, Any],

def normalize_tool_request(provider_name: str, tool_name: str, kwargs: dict[str, Any]) -> str:
"""Create a normalized hash of the tool request for consistent matching."""
test_id = get_test_context()
normalized = {
"test_id": test_id,
"provider": provider_name,
"tool_name": tool_name,
"kwargs": kwargs,
Expand Down
1 change: 1 addition & 0 deletions tests/integration/ci_matrix.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
{"suite": "responses", "setup": "azure"},
{"suite": "gpt-reasoning", "setup": "gpt-reasoning"},
{"suite": "responses", "setup": "watsonx"},
{"suite": "responses", "setup": "vllm-qwen35"},
{"suite": "base-vllm-subset", "setup": "vllm"},
{"suite": "vllm-reasoning", "setup": "vllm"},
{"suite": "ollama-reasoning", "setup": "ollama-reasoning"}
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading