-
Notifications
You must be signed in to change notification settings - Fork 302
Description
Summary
I've been investigating some unexpected routing behavior in my E2E tests and wanted to share my findings. I'm not entirely sure if this is a configuration issue on my end or a potential bug, but the evidence seems worth discussing.
Observed Behavior
When testing category-based routing with model: auto, I'm seeing math queries consistently routed to Model-A instead of the expected Model-B, despite the configuration showing Model-B has a higher score.
Evidence
1. Configuration (config/testing/config.e2e.yaml)
categories:
- name: math
model_scores:
- model: "Model-B"
score: 1.0 # ← HIGHEST SCORE
- model: "Model-A"
score: 0.9
default_model: "Model-A"
threshold: 0.6Expected behavior: Math queries should route to Model-B (score 1.0)
2. Test Results - BEFORE
Running a minimal reproduction test with random math queries to avoid cache hits:
TEST 1: Direct Classification API (port 8080)
================================================================================
Query: What is 234 + 567?
Category: math
Confidence: 0.886
Above threshold (0.6): ✅ YES
Classification correct: ✅ YES
TEST 2: Envoy Routing with model='auto' (port 8801)
================================================================================
Query: What is 234 + 567?
Request model: auto
Response model: Model-A
X-VSR-Selected-Model header: Model-A
Expected: Model-B (score 1.0 in config)
Actual: Model-A
❌ FAIL: Incorrectly routed to Model-A instead of Model-B
Pattern: Classification API correctly identifies math with high confidence (0.886 > threshold 0.6), but Envoy routing selects wrong model.
3. Router Logs Analysis
During test execution, I noticed these logs from the ExtProc router:
🔧 DEBUG: Router using UNIFIED classifier (LoRA models)
🔧 DEBUG: Wired UnifiedClassifier to Classifier for delegation (initialized=true)
...
❌ ERROR: Traditional BERT classifier not initialized
⚠️ WARNING: Classifier fallback: using 'biology' as category (classifier not initialized)
🔧 DEBUG: SelectBestModelForCategory: category=biology, threshold=0.600000
🔧 DEBUG: No valid model found for category 'biology'
🔧 DEBUG: Using default model: Model-A
Observation: Even though the UnifiedClassifier (LoRA-based) was initialized, the router seems to be falling back to an uninitialized traditional BERT classifier, resulting in:
- Wrong category (
biologyinstead ofmath) - Fallback to default model (
Model-A)
4. Architecture Investigation
Looking at the code, I noticed there are two classifier systems:
Classification API Server (src/semantic-router/pkg/services/classification.go):
- Uses
UnifiedClassifier(LoRA-based models) - Works correctly ✅
ExtProc Router (src/semantic-router/pkg/extproc/router.go):
- Originally used legacy
Classifier(traditional BERT) - May not have been wired to use the
UnifiedClassifier
Suspected Root Cause
I think the issue might be that the ExtProc router is using a different classifier instance than the Classification API:
- Classification API (port 8080): Uses initialized
UnifiedClassifier(LoRA-based) → correct category - ExtProc Router (port 8801): Uses uninitialized legacy
Classifier(traditional BERT) → wrong category → wrong model
Proposed Fix (Unverified)
I tried modifying src/semantic-router/pkg/extproc/router.go to wire the UnifiedClassifier from ClassificationService:
// In NewOpenAIRouter:
if classificationSvc.HasUnifiedClassifier() {
unifiedClassifier := classificationSvc.GetUnifiedClassifier()
if unifiedClassifier != nil {
classifier.UnifiedClassifier = unifiedClassifier
logging.Infof("🔧 DEBUG: Wired UnifiedClassifier to Classifier for delegation")
}
}And added delegation in src/semantic-router/pkg/classification/classifier.go:
func (c *Classifier) ClassifyCategoryWithEntropy(text string) (string, float64, entropy.ReasoningDecision, error) {
// Try UnifiedClassifier (LoRA models) first - highest accuracy
if c.UnifiedClassifier != nil {
return c.classifyWithUnifiedClassifier(text)
}
// ... rest of original logic
}Test Results - AFTER
TEST 1: Direct Classification API
================================================================================
Query: What is 789 + 123?
Category: math
Confidence: 0.896
Above threshold (0.6): ✅ YES
TEST 2: Envoy Routing with model='auto'
================================================================================
Query: What is 789 + 123?
Response model: Model-B
X-VSR-Selected-Model header: Model-B
Expected: Model-B
Actual: Model-B
✅ PASS: Correctly routed to Model-B
Questions
- Is this the intended behavior? Should ExtProc and the Classification API use the same classifier?
- If so, is my proposed fix the right approach, or is there a better way to ensure consistency?
- Could this be related to Bug: Response model field does not match routing decision #430 (category-based routing)?
Reproduction
Setup:
make run-router-e2e # Starts Envoy, semantic-router, llm-katanTest script:
# /tmp/minimal_repro_test.py
import random
import requests
query = f"What is {random.randint(100, 999)} + {random.randint(100, 999)}?"
# Test 1: Classification API
response = requests.post(
"http://localhost:8080/api/v1/classify/intent",
json={"text": query}
)
result = response.json()
print(f"Category: {result['classification']['category']}")
print(f"Confidence: {result['classification']['confidence']:.3f}")
# Test 2: Envoy routing
response = requests.post(
"http://localhost:8801/v1/chat/completions",
json={
"model": "auto",
"messages": [{"role": "user", "content": query}]
}
)
result = response.json()
print(f"Selected model: {result['model']}")
print(f"Expected: Model-B (score 1.0)")Environment
- Config:
config/testing/config.e2e.yaml - Models: LoRA intent classifiers (
models/lora_intent_classifier_bert-base-uncased_model/) - Test:
e2e-tests/02-router-classification-test.py::test_category_classification
I'd appreciate any guidance on whether this is expected behavior or if my analysis is on the right track. Happy to provide more details or test different approaches!
Metadata
Metadata
Assignees
Labels
Type
Projects
Status