Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions code-model-comparison/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
load_dotenv()

# Set page config
st.set_page_config(page_title="Code Generation Model Comparison", layout="wide")
st.set_page_config(page_title="⚔️ CodeArena: Compare Codegen Models", layout="wide")

# Custom CSS for responsive code containers
st.markdown(
Expand Down Expand Up @@ -60,7 +60,7 @@
st.session_state.evaluation_results = {"model1": None, "model2": None}

# Main interface
st.title("Code Generation Model Comparison")
st.title("⚔️ CodeArena: Compare Codegen Models")
powered_by_html = """
<div style='display: flex; align-items: center; gap: 10px; margin-top: -10px;'>
<span style='font-size: 20px; color: #666;'>Powered by</span>
Expand Down Expand Up @@ -89,7 +89,7 @@

# If default models are not in available models, use first two available
if default_model1 not in all_models:
default_model1 = all_models[0] if all_models else "Claude Sonnet 4"
default_model1 = all_models[0] if all_models else "Claude Opus 4.1"
if default_model2 not in all_models:
default_model2 = all_models[1] if len(all_models) > 1 else all_models[0]

Expand Down
27 changes: 18 additions & 9 deletions code-model-comparison/model_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@

# Available models
AVAILABLE_MODELS = {
"Claude Sonnet 4": "openrouter/anthropic/claude-sonnet-4",
"Claude Opus 4.1": "openrouter/anthropic/claude-opus-4.1",
"Qwen3-Coder": "openrouter/qwen/qwen3-coder",
"Gemini 2.5 Flash": "openrouter/google/gemini-2.5-flash",
"GPT-4.1": "openrouter/openai/gpt-4.1",
"GPT-5": "gpt-5",
}
Comment on lines +9 to 13
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Model catalog updates look consistent, but verify identifiers exist upstream.

"Claude Opus 4.1" and "GPT-5" entries look fine. Ensure LiteLLM recognizes these identifiers and providers as configured.

Run web checks to confirm model IDs and parameter support:

If needed, I can adjust the mapping to provider-qualified IDs (e.g., openai/gpt-5) or add a fallback route.


🌐 Web query:

Does LiteLLM support the model id "gpt-5" with acompletion, and should we pass max_tokens or max_completion_tokens for streaming?

💡 Result:

Short answer: Yes — LiteLLM supports GPT‑5 model IDs (e.g. "openai/gpt-5" or "azure/gpt-5") for its completion APIs, and for GPT‑5 you should use max_completion_tokens to control the streamed completion length (LiteLLM will also accept max_tokens and map it to max_completion_tokens for GPT‑5, but passing max_completion_tokens is clearer/explicit). (docs.litellm.ai)

Quick notes and example:

  • Supported IDs: LiteLLM lists OpenAI/Azure GPT‑5 model IDs such as openai/gpt-5 and openai/gpt-5-chat (and provider-prefixed variants). (docs.litellm.ai)
  • Params: LiteLLM exposes both max_tokens and max_completion_tokens; for GPT‑5 it maps max_tokens → max_completion_tokens, and you can (and probably should) pass max_completion_tokens when streaming so you explicitly set the completion token cap. (docs.litellm.ai)

Example (Python):
from litellm import completion
resp = completion(
model="openai/gpt-5",
messages=[{"role":"user","content":"Write a short summary of X."}],
stream=True,
max_completion_tokens=250
)
for chunk in resp:
print(chunk.choices[0].delta)

If you want, tell me which provider/model string you’re using (openai/…, azure/…, or bare "gpt-5") and I’ll give a tailored snippet and confirm the exact param mapping for your LiteLLM version.

Citations:


🌐 Web query:

Is "openrouter/anthropic/claude-opus-4.1" a valid model identifier on OpenRouter via LiteLLM?

💡 Result:

Short answer: No. The valid Anthropic Opus identifier on OpenRouter via LiteLLM uses the Claude 3 naming, e.g., openrouter/anthropic/claude-3-opus-20240229, not “claude-opus-4.1.”[1][3]

Details:

  • LiteLLM’s OpenRouter convention is openrouter/provider/model-name, and Anthropic’s Opus listing on OpenRouter uses the Claude 3 format such as claude-3-opus-20240229.[1]
  • LiteLLM docs state you can call any OpenRouter model by passing model="openrouter/", with examples including openrouter/anthropic/claude-2 and the Claude 3 variants, but there is no “claude-opus-4.1” pattern shown.[3]
  • A third-party page shows a string like openrouter/anthropic/claude-opus-4.1, but this appears inconsistent with OpenRouter’s established Claude 3 model identifiers and should not be relied upon without confirmation on the OpenRouter models list.[2]

Recommendation:

  • Use the confirmed identifier: openrouter/anthropic/claude-3-opus-20240229.[1]
  • If you need the latest Sonnet/Haiku/Opus variant, check the OpenRouter model list and plug it into LiteLLM as openrouter/anthropic/.[3]

Citations:


Fix invalid LiteLLM model identifiers and use explicit max_completion_tokens

Please update the model mapping in code-model-comparison/model_service.py and streaming calls as follows:

  • In the AVAILABLE_MODELS mapping (lines 9–13):

    -    "Claude Opus 4.1": "openrouter/anthropic/claude-opus-4.1",
    +    "Claude Opus 4.1": "openrouter/anthropic/claude-3-opus-20240229",  # valid OpenRouter Anthropic Opus ID
    
    -    "GPT-5": "gpt-5",
    +    "GPT-5": "openai/gpt-5",  # provider-prefixed for LiteLLM compatibility
  • In your GPT-5 streaming completion branches (around lines 49–64):
    • Replace any usage of max_tokens with max_completion_tokens to explicitly set the streamed token cap (LiteLLM will map max_tokensmax_completion_tokens but explicit is clearer).

    resp = completion(
      model="openai/gpt-5",
      messages=…,
      stream=True,
    - max_tokens=250
    + max_completion_tokens=250
    )

These changes ensure LiteLLM recognizes both model IDs and correctly applies the streaming token limit.

🤖 Prompt for AI Agents
In code-model-comparison/model_service.py lines 9–13 and the GPT-5 streaming
branches around lines 49–64, update the AVAILABLE_MODELS mapping so the GPT-5
entry uses the valid LiteLLM identifier ("openai/gpt-5") instead of "gpt-5", and
in all GPT-5 stream calls replace max_tokens with max_completion_tokens (e.g.,
max_completion_tokens=250) so the streamed token cap is set explicitly; keep the
rest of the call arguments the same.



Expand Down Expand Up @@ -46,13 +46,22 @@ async def get_model_response_async(

try:
# Get streaming response from the model using LiteLLM asynchronously.
response = await acompletion(
model=model_mapping,
messages=messages,
api_key=os.getenv("OPENROUTER_API_KEY"),
max_tokens=2000,
stream=True,
)
if "GPT" in model_name:
response = await acompletion(
model=model_mapping,
messages=messages,
api_key=os.getenv("OPENAI_API_KEY"),
max_completion_tokens=2000,
stream=True,
)
else:
response = await acompletion(
model=model_mapping,
messages=messages,
api_key=os.getenv("OPENROUTER_API_KEY"),
max_tokens=2000,
stream=True,
)
Comment on lines +49 to +64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Use max_tokens for LiteLLM acompletion; align params across routes.

The GPT branch passes max_completion_tokens, which acompletion may not accept. Use max_tokens in both branches to avoid parameter mismatch and parity issues.

Apply this minimal diff:

-        if "GPT" in model_name:
-            response = await acompletion(
-                model=model_mapping,
-                messages=messages,
-                api_key=os.getenv("OPENAI_API_KEY"),
-                max_completion_tokens=2000,
-                stream=True,
-            )
+        if "GPT" in model_name:
+            response = await acompletion(
+                model=model_mapping,
+                messages=messages,
+                api_key=os.getenv("OPENAI_API_KEY"),
+                max_tokens=2000,
+                stream=True,
+            )

Optionally, dedupe branches by deriving api_key and calling acompletion once:

-        if "GPT" in model_name:
-            response = await acompletion(
-                model=model_mapping,
-                messages=messages,
-                api_key=os.getenv("OPENAI_API_KEY"),
-                max_tokens=2000,
-                stream=True,
-            )
-        else:
-            response = await acompletion(
-                model=model_mapping,
-                messages=messages,
-                api_key=os.getenv("OPENROUTER_API_KEY"),
-                max_tokens=2000,
-                stream=True,
-            )
+        is_gpt = "GPT" in model_name
+        response = await acompletion(
+            model=model_mapping,
+            messages=messages,
+            api_key=os.getenv("OPENAI_API_KEY") if is_gpt else os.getenv("OPENROUTER_API_KEY"),
+            max_tokens=2000,
+            stream=True,
+        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if "GPT" in model_name:
response = await acompletion(
model=model_mapping,
messages=messages,
api_key=os.getenv("OPENAI_API_KEY"),
max_completion_tokens=2000,
stream=True,
)
else:
response = await acompletion(
model=model_mapping,
messages=messages,
api_key=os.getenv("OPENROUTER_API_KEY"),
max_tokens=2000,
stream=True,
)
if "GPT" in model_name:
response = await acompletion(
model=model_mapping,
messages=messages,
api_key=os.getenv("OPENAI_API_KEY"),
max_tokens=2000,
stream=True,
)
else:
response = await acompletion(
model=model_mapping,
messages=messages,
api_key=os.getenv("OPENROUTER_API_KEY"),
max_tokens=2000,
stream=True,
)
🤖 Prompt for AI Agents
In code-model-comparison/model_service.py around lines 49 to 64, the GPT branch
passes max_completion_tokens while the other branch uses max_tokens, causing
acompletion parameter mismatch; change the GPT branch to use max_tokens=2000
(matching the other branch) and align other parameter names, and optionally
simplify by setting api_key = os.getenv("OPENAI_API_KEY") if "GPT" in model_name
else os.getenv("OPENROUTER_API_KEY") and call acompletion once with the derived
api_key and shared args (model=model_mapping, messages=messages,
api_key=api_key, max_tokens=2000, stream=True).


if not response:
yield "Error: No response received from model"
Expand Down