llamastack · msager27 · Mar 19, 2026 · Mar 19, 2026
@@ -23,7 +23,7 @@ inference provider, based on integration test results.
 |----------|--------|---------|---------|----------|
 | azure | 101 | 101 | 0 | 86% |
 | openai | 118 | 118 | 0 | 100% |
-| vllm | 1 | 1 | 0 | 1% |
+| vllm | 98 | 98 | 0 | 83% |
 | watsonx | 56 | 56 | 0 | 48% |
 
 ## Provider Details
@@ -34,112 +34,112 @@ Models, endpoints, and versions used during test recordings.
 |----------|----------|----------|--------------|
 | azure | gpt-4o | llama-stack-test.openai.azure.com, lls-test.openai.azure.com | openai sdk: 2.5.0 |
 | openai | gpt-4o, o4-mini, text-embedding-3-small | api.openai.com | openai sdk: 2.5.0 |
-| vllm | Qwen/Qwen3-0.6B | — | — |
+| vllm | Qwen/Qwen3-0.6B, Qwen/Qwen3.5-35B-A3B | — | openai sdk: 2.5.0, vllm server: 0.17.1 |
 | watsonx | meta-llama/llama-3-3-70b-instruct | us-south.ml.cloud.ibm.com | openai sdk: 2.5.0 |
 
 ## Basic Responses
 
 | Feature | azure | openai | vllm | watsonx |
 | --- | --- | --- | --- | --- |
-| extra body guided choice | ✅ | ✅ | — | ✅ |
-| include logprobs non streaming | ✅ | ✅ | — | ✅ |
-| include logprobs streaming | ✅ | ✅ | — | ✅ |
-| include logprobs with function tools | ✅ | ✅ | — | ✅ |
-| include logprobs with web search | ✅ | ✅ | — | ✅ |
-| non streaming basic | ✅ | ✅ | — | ✅ |
-| non streaming image | ✅ | ✅ | — | ⏭️ |
-| non streaming multi turn | ✅ | ✅ | — | ✅ |
-| non streaming multi turn image | ✅ | ✅ | — | ⏭️ |
-| streaming basic | ✅ | ✅ | — | ✅ |
-| streaming incremental content | ✅ | ✅ | — | ✅ |
+| extra body guided choice | ✅ | ✅ | ✅ | ✅ |
+| include logprobs non streaming | ✅ | ✅ | ✅ | ✅ |
+| include logprobs streaming | ✅ | ✅ | ✅ | ✅ |
+| include logprobs with function tools | ✅ | ✅ | ✅ | ✅ |
+| include logprobs with web search | ✅ | ✅ | ✅ | ✅ |
+| non streaming basic | ✅ | ✅ | ✅ | ✅ |
+| non streaming image | ✅ | ✅ | ✅ | ⏭️ |
+| non streaming multi turn | ✅ | ✅ | ✅ | ✅ |
+| non streaming multi turn image | ✅ | ✅ | ✅ | ⏭️ |
+| streaming basic | ✅ | ✅ | ✅ | ✅ |
+| streaming incremental content | ✅ | ✅ | ✅ | ✅ |
 
 ## Conversation Responses
 
 | Feature | azure | openai | vllm | watsonx |
 | --- | --- | --- | --- | --- |
-| conversation backward compatibility | ✅ | ✅ | — | ✅ |
-| conversation basic workflow | ✅ | ✅ | — | ✅ |
-| conversation context loading | ✅ | ✅ | — | ✅ |
-| conversation error handling | ⏭️ | ✅ | — | ⏭️ |
-| conversation multi turn and streaming | ✅ | ✅ | — | ✅ |
+| conversation backward compatibility | ✅ | ✅ | ✅ | ✅ |
+| conversation basic workflow | ✅ | ✅ | ✅ | ✅ |
+| conversation context loading | ✅ | ✅ | ✅ | ✅ |
+| conversation error handling | ⏭️ | ✅ | ⏭️ | ⏭️ |
+| conversation multi turn and streaming | ✅ | ✅ | ✅ | ✅ |
 
 ## File Search
 
 | Feature | azure | openai | vllm | watsonx |
 | --- | --- | --- | --- | --- |
-| file search filter by category | ✅ | ✅ | — | ⏭️ |
-| file search filter by date range | ✅ | ✅ | — | ⏭️ |
-| file search filter by region | ✅ | ✅ | — | ⏭️ |
-| file search filter compound and | ✅ | ✅ | — | ⏭️ |
-| file search filter compound or | ✅ | ✅ | — | ⏭️ |
-| file search streaming events | ✅ | ✅ | — | ⏭️ |
-| text format | ✅ | ✅ | — | ✅ |
+| file search filter by category | ✅ | ✅ | ✅ | ⏭️ |
+| file search filter by date range | ✅ | ✅ | ✅ | ⏭️ |
+| file search filter by region | ✅ | ✅ | ✅ | ⏭️ |
+| file search filter compound and | ✅ | ✅ | ✅ | ⏭️ |
+| file search filter compound or | ✅ | ✅ | ✅ | ⏭️ |
+| file search streaming events | ✅ | ✅ | ✅ | ⏭️ |
+| text format | ✅ | ✅ | ✅ | ✅ |
 
 ## Mcp Authentication
 
 | Feature | azure | openai | vllm | watsonx |
 | --- | --- | --- | --- | --- |
-| mcp authorization backward compatibility | ✅ | ✅ | — | — |
-| mcp authorization bearer | ✅ | ✅ | — | — |
+| mcp authorization backward compatibility | ✅ | ✅ | ✅ | — |
+| mcp authorization bearer | ✅ | ✅ | ✅ | — |
 
 ## Openai Responses
 
 | Feature | azure | openai | vllm | watsonx |
 | --- | --- | --- | --- | --- |
-| background false is synchronous | ✅ | ✅ | — | ✅ |
-| background returns queued | ✅ | ✅ | — | ✅ |
-| incomplete details length | ✅ | ✅ | — | ✅ |
-| incomplete details length streaming | ✅ | ✅ | — | ✅ |
-| incomplete details max iterations exceeded | ✅ | ✅ | — | ✅ |
-| incomplete details max iterations exceeded streaming | ✅ | ✅ | — | ✅ |
-| incomplete details null when completed | ✅ | ✅ | — | ✅ |
-| reasoning effort | ⏭️ | ✅ | — | ⏭️ |
-| reasoning effort streaming | ⏭️ | ✅ | — | ⏭️ |
-| streaming includes usage | ✅ | ✅ | — | ✅ |
-| streaming invalid base64 image failure code is spec compliant | ✅ | ✅ | — | ⏭️ |
-| with max output tokens | ✅ | ✅ | — | ⏭️ |
-| with parallel tool calls and previous response | ✅ | ✅ | — | ✅ |
-| with parallel tool calls disabled | ✅ | ✅ | — | ⏭️ |
-| with parallel tool calls disabled streaming | ✅ | ✅ | — | ⏭️ |
-| with parallel tool calls enabled | ✅ | ✅ | — | ⏭️ |
-| with prompt cache key | ✅ | ✅ | — | ✅ |
-| with prompt cache key and previous response | ✅ | ✅ | — | ✅ |
-| with prompt cache key streaming | ✅ | ✅ | — | ✅ |
-| with safety identifier | ✅ | ✅ | — | ✅ |
-| with safety identifier and previous response | ✅ | ✅ | — | ✅ |
-| with safety identifier streaming | ✅ | ✅ | — | ✅ |
-| with service tier | ⏭️ | ✅ | — | ⏭️ |
-| with service tier and previous response | ⏭️ | ✅ | — | ⏭️ |
-| with service tier auto | ⏭️ | ✅ | — | ⏭️ |
-| with service tier auto and previous response | ⏭️ | ✅ | — | ⏭️ |
-| with service tier auto streaming | ⏭️ | ✅ | — | ⏭️ |
-| with service tier flex | ⏭️ | ✅ | — | ⏭️ |
-| with service tier flex streaming | ⏭️ | ✅ | — | ⏭️ |
-| with service tier streaming | ⏭️ | ✅ | — | ⏭️ |
-| with small max output tokens | ✅ | ✅ | — | ⏭️ |
-| with stream options and previous response | ✅ | ✅ | — | ✅ |
-| with stream options includes usage | ✅ | ✅ | — | ✅ |
-| with stream options non streaming | ✅ | ✅ | — | ✅ |
-| with top logprobs | ✅ | ✅ | — | ✅ |
-| with top logprobs and previous response | ✅ | ✅ | — | ✅ |
-| with top logprobs streaming | ✅ | ✅ | — | ✅ |
-| with top p | ✅ | ✅ | — | ✅ |
-| with top p and previous response | ✅ | ✅ | — | ✅ |
-| with top p streaming | ✅ | ✅ | — | ✅ |
-| with truncation and previous response | ✅ | ✅ | — | ✅ |
-| with truncation disabled | ✅ | ✅ | — | ✅ |
-| with truncation disabled streaming | ✅ | ✅ | — | ✅ |
+| background false is synchronous | ✅ | ✅ | ✅ | ✅ |
+| background returns queued | ✅ | ✅ | ✅ | ✅ |
+| incomplete details length | ✅ | ✅ | ✅ | ✅ |
+| incomplete details length streaming | ✅ | ✅ | ✅ | ✅ |
+| incomplete details max iterations exceeded | ✅ | ✅ | ✅ | ✅ |
+| incomplete details max iterations exceeded streaming | ✅ | ✅ | ✅ | ✅ |
+| incomplete details null when completed | ✅ | ✅ | ✅ | ✅ |
+| reasoning effort | ⏭️ | ✅ | ⏭️ | ⏭️ |
+| reasoning effort streaming | ⏭️ | ✅ | ⏭️ | ⏭️ |
+| streaming includes usage | ✅ | ✅ | ✅ | ✅ |
+| streaming invalid base64 image failure code is spec compliant | ✅ | ✅ | ✅ | ⏭️ |
+| with max output tokens | ✅ | ✅ | ✅ | ⏭️ |
+| with parallel tool calls and previous response | ✅ | ✅ | ✅ | ✅ |
+| with parallel tool calls disabled | ✅ | ✅ | ✅ | ⏭️ |
+| with parallel tool calls disabled streaming | ✅ | ✅ | ✅ | ⏭️ |
+| with parallel tool calls enabled | ✅ | ✅ | ✅ | ⏭️ |
+| with prompt cache key | ✅ | ✅ | ✅ | ✅ |
+| with prompt cache key and previous response | ✅ | ✅ | ✅ | ✅ |
+| with prompt cache key streaming | ✅ | ✅ | ✅ | ✅ |
+| with safety identifier | ✅ | ✅ | ✅ | ✅ |
+| with safety identifier and previous response | ✅ | ✅ | ✅ | ✅ |
+| with safety identifier streaming | ✅ | ✅ | ✅ | ✅ |
+| with service tier | ⏭️ | ✅ | ⏭️ | ⏭️ |
+| with service tier and previous response | ⏭️ | ✅ | ⏭️ | ⏭️ |
+| with service tier auto | ⏭️ | ✅ | ⏭️ | ⏭️ |
+| with service tier auto and previous response | ⏭️ | ✅ | ⏭️ | ⏭️ |
+| with service tier auto streaming | ⏭️ | ✅ | ⏭️ | ⏭️ |
+| with service tier flex | ⏭️ | ✅ | ⏭️ | ⏭️ |
+| with service tier flex streaming | ⏭️ | ✅ | ⏭️ | ⏭️ |
+| with service tier streaming | ⏭️ | ✅ | ⏭️ | ⏭️ |
+| with small max output tokens | ✅ | ✅ | ✅ | ⏭️ |
+| with stream options and previous response | ✅ | ✅ | ✅ | ✅ |
+| with stream options includes usage | ✅ | ✅ | ✅ | ✅ |
+| with stream options non streaming | ✅ | ✅ | ✅ | ✅ |
+| with top logprobs | ✅ | ✅ | ✅ | ✅ |
+| with top logprobs and previous response | ✅ | ✅ | ✅ | ✅ |
+| with top logprobs streaming | ✅ | ✅ | ✅ | ✅ |
+| with top p | ✅ | ✅ | ✅ | ✅ |
+| with top p and previous response | ✅ | ✅ | ✅ | ✅ |
+| with top p streaming | ✅ | ✅ | ✅ | ✅ |
+| with truncation and previous response | ✅ | ✅ | ✅ | ✅ |
+| with truncation disabled | ✅ | ✅ | ✅ | ✅ |
+| with truncation disabled streaming | ✅ | ✅ | ✅ | ✅ |
 
 ## Prompt Templates
 
 | Feature | azure | openai | vllm | watsonx |
 | --- | --- | --- | --- | --- |
-| basic prompt template | ✅ | ✅ | — | ✅ |
-| multi variable prompt template | ✅ | ✅ | — | ✅ |
-| multi version prompt template | ✅ | ✅ | — | ✅ |
-| prompt template no variables | ✅ | ✅ | — | ✅ |
-| prompt template with multi turn | ✅ | ✅ | — | ✅ |
-| prompt template with streaming | ✅ | ✅ | — | ✅ |
+| basic prompt template | ✅ | ✅ | ✅ | ✅ |
+| multi variable prompt template | ✅ | ✅ | ✅ | ✅ |
+| multi version prompt template | ✅ | ✅ | ✅ | ✅ |
+| prompt template no variables | ✅ | ✅ | ✅ | ✅ |
+| prompt template with multi turn | ✅ | ✅ | ✅ | ✅ |
+| prompt template with streaming | ✅ | ✅ | ✅ | ✅ |
 
 ## Reasoning
 
@@ -162,53 +162,53 @@ Models, endpoints, and versions used during test recordings.
 
 | Feature | azure | openai | vllm | watsonx |
 | --- | --- | --- | --- | --- |
-| completed response has no error | ✅ | ✅ | — | ✅ |
-| invalid base64 image returns image error | ✅ | ✅ | — | ⏭️ |
-| invalid image url returns image error | ✅ | ✅ | — | ⏭️ |
-| non vision model returns error for image input | ✅ | ✅ | — | ✅ |
-| non vision model with base64 image returns server error | ✅ | ✅ | — | ✅ |
+| completed response has no error | ✅ | ✅ | ✅ | ✅ |
+| invalid base64 image returns image error | ✅ | ✅ | ⏭️ | ⏭️ |
+| invalid image url returns image error | ✅ | ✅ | ⏭️ | ⏭️ |
+| non vision model returns error for image input | ✅ | ✅ | ✅ | ✅ |
+| non vision model with base64 image returns server error | ✅ | ✅ | ✅ | ✅ |
 
 ## Structured Output
 
 | Feature | azure | openai | vllm | watsonx |
 | --- | --- | --- | --- | --- |
-| json schema array of integers | ✅ | ✅ | — | ✅ |
-| json schema array of objects | ✅ | ✅ | — | ✅ |
-| json schema array of strings | ✅ | ✅ | — | ⏭️ |
-| json schema boolean types | ✅ | ✅ | — | ⏭️ |
-| json schema float types | ✅ | ✅ | — | ⏭️ |
-| json schema integer types | ✅ | ✅ | — | ⏭️ |
-| json schema mixed types structures | ✅ | ✅ | — | ✅ |
-| json schema nested objects | ✅ | ✅ | — | ✅ |
-| json schema string types | ✅ | ✅ | — | ✅ |
+| json schema array of integers | ✅ | ✅ | ✅ | ✅ |
+| json schema array of objects | ✅ | ✅ | ✅ | ✅ |
+| json schema array of strings | ✅ | ✅ | ✅ | ⏭️ |
+| json schema boolean types | ✅ | ✅ | ✅ | ⏭️ |
+| json schema float types | ✅ | ✅ | ✅ | ⏭️ |
+| json schema integer types | ✅ | ✅ | ✅ | ⏭️ |
+| json schema mixed types structures | ✅ | ✅ | ✅ | ✅ |
+| json schema nested objects | ✅ | ✅ | ✅ | ✅ |
+| json schema string types | ✅ | ✅ | ✅ | ✅ |
 
 ## Tool Responses
 
 | Feature | azure | openai | vllm | watsonx |
 | --- | --- | --- | --- | --- |
-| connector resolution mcp tool | ✅ | ✅ | — | — |
-| function call ordering 1 | ✅ | ✅ | — | — |
-| function call ordering 2 | ✅ | ✅ | — | — |
-| function call output list file | ✅ | ✅ | — | — |
-| function call output list image | ✅ | ✅ | — | — |
-| function call output list text | ✅ | ✅ | — | — |
-| function call output list text multi block | ✅ | ✅ | — | — |
-| max tool calls with function tools | ✅ | ✅ | — | — |
-| max tool calls with mcp tools | ✅ | ✅ | — | — |
-| mcp tool approval | ✅ | ✅ | — | — |
-| multi turn streaming web search | ✅ | ✅ | — | — |
-| non streaming custom tool | ✅ | ✅ | — | — |
-| non streaming file search | ✅ | ✅ | — | — |
-| non streaming file search empty vector store | ✅ | ✅ | — | — |
-| non streaming mcp tool | ✅ | ✅ | — | — |
-| non streaming multi turn tool execution | ✅ | ✅ | — | — |
-| non streaming web search | ✅ | ✅ | — | — |
-| parallel tool calls with function tools | ✅ | ✅ | — | — |
-| parallel tool calls with mcp tools | ✅ | ✅ | — | — |
-| sequential file search | ✅ | ✅ | — | — |
-| sequential mcp tool | ✅ | ✅ | — | — |
-| streaming multi turn tool execution | ✅ | ✅ | — | — |
-| streaming web search | ✅ | ✅ | — | — |
+| connector resolution mcp tool | ✅ | ✅ | ✅ | — |
+| function call ordering 1 | ✅ | ✅ | ✅ | — |
+| function call ordering 2 | ✅ | ✅ | ✅ | — |
+| function call output list file | ✅ | ✅ | ✅ | — |
+| function call output list image | ✅ | ✅ | ⏭️ | — |
+| function call output list text | ✅ | ✅ | ✅ | — |
+| function call output list text multi block | ✅ | ✅ | ✅ | — |
+| max tool calls with function tools | ✅ | ✅ | ✅ | — |
+| max tool calls with mcp tools | ✅ | ✅ | ✅ | — |
+| mcp tool approval | ✅ | ✅ | ✅ | — |
+| multi turn streaming web search | ✅ | ✅ | ✅ | — |
+| non streaming custom tool | ✅ | ✅ | ✅ | — |
+| non streaming file search | ✅ | ✅ | ✅ | — |
+| non streaming file search empty vector store | ✅ | ✅ | ✅ | — |
+| non streaming mcp tool | ✅ | ✅ | ✅ | — |
+| non streaming multi turn tool execution | ✅ | ✅ | ✅ | — |
+| non streaming web search | ✅ | ✅ | ✅ | — |
+| parallel tool calls with function tools | ✅ | ✅ | ✅ | — |
+| parallel tool calls with mcp tools | ✅ | ✅ | ✅ | — |
+| sequential file search | ✅ | ✅ | ✅ | — |
+| sequential mcp tool | ✅ | ✅ | ✅ | — |
+| streaming multi turn tool execution | ✅ | ✅ | ✅ | — |
+| streaming web search | ✅ | ✅ | ✅ | — |
 
 ---
 

@@ -191,7 +191,9 @@ def normalize_inference_request(method: str, url: str, headers: dict[str, Any],
 
 def normalize_tool_request(provider_name: str, tool_name: str, kwargs: dict[str, Any]) -> str:
     """Create a normalized hash of the tool request for consistent matching."""
+    test_id = get_test_context()
     normalized = {
+        "test_id": test_id,
         "provider": provider_name,
         "tool_name": tool_name,
         "kwargs": kwargs,

@@ -8,6 +8,7 @@
     {"suite": "responses", "setup": "azure"},
     {"suite": "gpt-reasoning", "setup": "gpt-reasoning"},
     {"suite": "responses", "setup": "watsonx"},
+    {"suite": "responses", "setup": "vllm-qwen35"},
     {"suite": "base-vllm-subset", "setup": "vllm"},
     {"suite": "vllm-reasoning", "setup": "vllm"},
     {"suite": "ollama-reasoning", "setup": "ollama-reasoning"}