Refactor providers by protocol family (discussion #122)#213
Refactor providers by protocol family (discussion #122)#213Leeaandrob merged 7 commits intosipeed:mainfrom
Conversation
|
@Zepan This PR directly addresses roadmap issue #283 (Refactor Provider Architecture by Protocol — priority: high, status: In Progress). It reorganizes providers by protocol family instead of vendor, which is the exact refactor described in the board. +1484/-676 across 11 files — significant but well-scoped. This is foundational work that will simplify adding new providers going forward. Recommendation: Prioritize review and merge. This is the highest-priority In Progress item on the board. Other provider PRs (#246 Mistral, #252 Bedrock, #177 Z.AI, #110 Gemini, #82 OpenAI-compat) should ideally wait for or build on top of this refactor. |
|
@jmahotiedu please solve the conflicts i'll prioritize this pr for now! |
Leeaandrob
left a comment
There was a problem hiding this comment.
PR #213 Deep Review — @jmahotiedu
Hey @jmahotiedu, thanks for tackling the Phase 1 of Discussion #122. The overall architectural direction is solid — separating protocol logic from routing logic is the right call. The factory pattern is clean, the enum-based provider selection is well-structured, and CI is green.
However, there are a few critical and high-severity issues that need to be addressed before this can be merged safely. I've left inline comments on the specific locations.
Summary of Findings
CRITICAL (Must Fix)
- C1: Triple type duplication across
types.go,openai_compat/, andanthropic/— creates ~180 lines of pure type-conversion boilerplate and exponential maintenance burden - C2: Anthropic base URL hardcoded in
NewProvider()— ignoresapi_basefrom config, breaking users with custom Anthropic endpoints
HIGH (Should Fix)
- H1:
NewHTTPProvidersignature changed from(apiKey, apiBase, proxy string)to(apiKey, apiBase string, proxy ...string)— breaking API change for external consumers - H2: Prefix stripping logic is hardcoded and incomplete — misses
deepseek/,openrouter/,google/and others - H3: Test coverage gaps — no tests for
createClaudeAuthProvider(),createCodexAuthProvider(), token refresh path, or proxy configuration
MEDIUM
- M1: Duplicated branches in
openai_compatparseResponsetool call parsing (identical if/else-if) - M2: Silent error swallowing in tool call JSON parsing — no logging when
json.Unmarshalfails - M3: Proxy URL parse failure silently ignored
Positives
- Clean factory pattern —
resolveProviderSelection()+CreateProvider()separation is well done - Typed enum (
providerTypeiota) is much better than string matching - All CI checks passing (fmt-check, vet, test)
- Backward-compatible adapter pattern preserves
HTTPProviderandClaudeProviderpublic interfaces - 13 table-driven test cases for
resolveProviderSelectioncovering many provider combinations - Correct usage of the official Anthropic SDK instead of raw HTTP
Verdict: REQUEST CHANGES
What needs to change before merge:
- Resolve type duplication (use type aliases or a shared types package)
- Pass
apiBaseto the anthropic provider constructor - Add tests for token refresh and auth provider paths
- Make prefix stripping extensible or move it to the factory layer
Estimated effort: ~4-6 hours of focused work.
Happy to re-review once these are addressed. Great foundation overall — this just needs the edges cleaned up.
pkg/providers/anthropic/provider.go
Outdated
| option.WithAuthToken(token), | ||
| option.WithBaseURL("https://api.anthropic.com"), | ||
| ) | ||
| return &Provider{client: &client} |
There was a problem hiding this comment.
[C2 — CRITICAL] @jmahotiedu — This base URL is hardcoded to https://api.anthropic.com, but factory.go:98-102 resolves cfg.Providers.Anthropic.APIBase from the user's config (defaulting to https://api.anthropic.com/v1).
Two problems here:
- The
apiBasefrom config is never passed toNewProvider(). Users who configure custom Anthropic endpoints (e.g., a corporate proxy, AWS Bedrock adapter, or regional endpoint) will have their config silently ignored — this is a functional regression. - There's a subtle inconsistency: the factory resolves to
https://api.anthropic.com/v1(with/v1suffix) while this constructor useshttps://api.anthropic.com(without). The Anthropic SDK appends/v1/messagesinternally so it happens to work, but the mismatch is suspicious and fragile.
Suggested fix:
func NewProvider(token string, opts ...ProviderOption) *Provider {
baseURL := "https://api.anthropic.com"
// apply opts for custom baseURL
...
}Or simply add an apiBase parameter:
func NewProvider(token, apiBase string) *Provider {
if apiBase == "" {
apiBase = "https://api.anthropic.com"
}
...
}
pkg/providers/anthropic/provider.go
Outdated
| Function *FunctionCall `json:"function,omitempty"` | ||
| Name string `json:"name,omitempty"` | ||
| Arguments map[string]interface{} `json:"arguments,omitempty"` | ||
| } |
There was a problem hiding this comment.
[C1 — CRITICAL] @jmahotiedu — These types (ToolCall, FunctionCall, LLMResponse, UsageInfo, Message, ToolDefinition, ToolFunctionDefinition) are defined three times in this PR:
pkg/providers/types.go(the canonical package-level types)pkg/providers/openai_compat/provider.go(duplicated)- Here in
pkg/providers/anthropic/provider.go(duplicated)
This triple duplication creates:
- ~180 lines of pure type-conversion boilerplate in
claude_provider.goandhttp_provider.go(thetoAnthropicProviderMessages,fromOpenAICompatToolCalls, etc. functions) - Shotgun surgery risk: any future change to these types (e.g., adding a
metadatafield toToolCall) requires updating 3 type definitions + 10+ conversion functions - The adapter files (
claude_provider.go,http_provider.go) have zero business logic — they exist solely to convert between identical type definitions
This violates the DRY principle and would not pass code review at Google, Apple, or Anthropic. The refactoring should reduce duplication, not multiply it.
Suggested approaches:
Option A — Type aliases in sub-packages (zero allocation, zero conversion):
// openai_compat/types.go
package openai_compat
import "github.com/sipeed/picoclaw/pkg/providers"
type Message = providers.Message
type ToolCall = providers.ToolCall
// ... etcOption B — Shared types package:
// pkg/providers/types/ — imported by all three packagesOption C — Keep types only in the parent package and have sub-packages accept/return them via interfaces.
| if prefix == "moonshot" || prefix == "nvidia" || prefix == "groq" || prefix == "ollama" { | ||
| model = model[idx+1:] | ||
| } | ||
| } |
There was a problem hiding this comment.
[H2 — HIGH] @jmahotiedu — This prefix stripping logic is hardcoded to only 4 providers (moonshot, nvidia, groq, ollama), but the factory in factory.go:191-196 routes models with prefixes like openrouter/, anthropic/, openai/, meta-llama/, deepseek/, and google/ — none of which get stripped here.
Concrete bug scenario: If a user sets provider: "deepseek" with model: "deepseek/deepseek-chat", the factory routes to providerTypeHTTPCompat but the model string "deepseek/deepseek-chat" is sent as-is to the DeepSeek API, which likely won't recognize its own prefix.
Suggested fixes:
- Make this a configurable allow-list or use a more generic heuristic
- Move prefix stripping to the factory layer (closer to where routing decisions are made) rather than the protocol layer
- At minimum, add
deepseekto the list
Also consider: should this be the provider's responsibility at all? The factory already knows the prefix — it could strip it before passing to the provider.
| arguments["raw"] = tc.Function.Arguments | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
[M1 — MEDIUM] @jmahotiedu — These two branches (lines 201-206 and 208-214) are identical code. The if tc.Type == "function" check is redundant because the else if tc.Function != nil branch does exactly the same thing.
Simplify to:
if tc.Function != nil {
name = tc.Function.Name
if tc.Function.Arguments != "" {
if err := json.Unmarshal([]byte(tc.Function.Arguments), &arguments); err != nil {
arguments["raw"] = tc.Function.Arguments
}
}
}| Proxy: http.ProxyURL(parsed), | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
[M3 — MEDIUM] @jmahotiedu — If url.Parse(proxyURL) fails, the error is silently swallowed and the provider operates without a proxy. The user has explicitly configured a proxy, so this silent fallback could be a security concern (traffic going unproxied when the user expects it to be proxied).
At minimum, consider returning an error from NewProvider or logging a warning.
pkg/providers/http_provider.go
Outdated
| func NewHTTPProvider(apiKey, apiBase, proxy string) *HTTPProvider { | ||
| client := &http.Client{ | ||
| Timeout: 120 * time.Second, | ||
| func NewHTTPProvider(apiKey, apiBase string, proxy ...string) *HTTPProvider { |
There was a problem hiding this comment.
[H1 — HIGH] @jmahotiedu — This signature changed from func NewHTTPProvider(apiKey, apiBase, proxy string) (3 positional args) to func NewHTTPProvider(apiKey, apiBase string, proxy ...string) (variadic).
This is a breaking API change for any external consumer of this package. While internal callers work fine (factory.go passes sel.proxy as a single string), any code that was calling NewHTTPProvider("key", "base", "") will still compile, but the semantics changed subtly: an empty string is now passed as proxy[0] inside the variadic, whereas before it was a direct positional arg.
If this is intentional to allow calling without proxy (NewHTTPProvider("key", "base")), please document the change. If not, consider keeping the 3-arg signature for backward compatibility.
pkg/providers/claude_provider.go
Outdated
|
|
||
| if len(tools) > 0 { | ||
| params.Tools = translateToolsForClaude(tools) | ||
| func toAnthropicProviderMessages(messages []Message) []anthropicprovider.Message { |
There was a problem hiding this comment.
[C1 — Related] @jmahotiedu — This entire block (lines 62-152) is ~90 lines of pure type-conversion boilerplate: toAnthropicProviderMessages, toAnthropicProviderTools, toAnthropicProviderToolCalls, fromAnthropicProviderResponse, fromAnthropicProviderToolCalls.
The same pattern exists in http_provider.go for openai_compat types (~90 more lines).
Combined, that's ~180 lines of code that exist solely because the same struct definitions were duplicated into the sub-packages. If the types were shared (via aliases or a shared package), these entire files would shrink to ~15-20 lines each — just the constructor and a direct delegate call.
| var args map[string]interface{} | ||
| if err := json.Unmarshal(tu.Input, &args); err != nil { | ||
| args = map[string]interface{}{"raw": string(tu.Input)} | ||
| } |
There was a problem hiding this comment.
[M2 — MEDIUM] @jmahotiedu — When json.Unmarshal fails for tool call input, the error is silently converted to {"raw": ...}. This makes it very hard to debug tool call failures in production — there's no log entry, no metric, no indication that parsing failed.
Consider at minimum adding a log.Printf or returning the error as a diagnostic field. The same pattern exists in openai_compat/provider.go:204-205.
| } | ||
| }) | ||
| } | ||
| } |
There was a problem hiding this comment.
[H3 — HIGH] @jmahotiedu — Good coverage for resolveProviderSelection (13 cases), but several important paths in factory.go are untested:
createClaudeAuthProvider()andcreateCodexAuthProvider()(lines 33-53) — these are the OAuth paths, zero test coverageshengsuanyun,deepseek,nvidiaprovider routing in the explicit provider switch — no test cases- Token refresh via
tokenSourceinanthropic/provider.go— no test - Proxy configuration in
openai_compat/provider.go— no test that proxy is actually applied to the HTTP transport - Edge cases: nil options map, wrong type assertions (e.g.,
max_tokensasfloat64instead ofint)
The auth provider tests would need mocking auth.GetCredential, which is understandable, but at least the routing decision (sel.providerType == providerTypeClaudeAuth) should be verified.
| return NewCodexProviderWithTokenSource(cred.AccessToken, cred.AccountID, createCodexTokenSource()), nil | ||
| } | ||
|
|
||
| func resolveProviderSelection(cfg *config.Config) (providerSelection, error) { |
There was a problem hiding this comment.
[Positive] @jmahotiedu — Clean separation between resolveProviderSelection (pure decision logic, easily testable) and CreateProvider (side-effectful instantiation). The providerSelection struct as an intermediate representation is a good pattern. The typed enum via iota is much more maintainable than string matching.
One minor suggestion: consider adding a String() method to providerType for better error messages and debugging.
Phase 1: centralize protocol message/tool/response types in protocoltypes and keep compatibility aliases in providers and protocol packages. Phase 1: preserve HTTPProvider constructor compatibility and route Anthropic api_base through factory auth/provider constructors with base URL normalization. Phase 2: expand provider routing/auth tests (deepseek/nvidia/shengsuanyun, codex/claude oauth/codex-cli) and add openai_compat + anthropic coverage for proxy transport, model normalization, numeric option coercion, token-source refresh, and base URL behavior. Phase 3: apply gofmt and validate with Dockerized tests (go test ./pkg/providers/... ./pkg/migrate and go test ./...).
|
Addressed the requested changes on top of Mapping: Review Items -> Fixes
Verification Commands + Outcomes
Note: Requesting re-review. |
|
Follow-up is pushed on Detailed item-by-item mapping and verification results are here: Could you please take another look when convenient? Thanks. |
Leeaandrob
left a comment
There was a problem hiding this comment.
Re-Review: PR #213 — Refactor providers by protocol family
@jmahotiedu Excellent work on the fix commit. All 8 findings from my previous review have been addressed comprehensively, plus you made additional improvements I didn't ask for. Let me trace each one.
Local Verification
| Check | Result |
|---|---|
go vet ./... |
PASS |
go build ./... |
PASS |
go test -race ./... |
PASS (all packages) |
go test -race ./pkg/providers/... |
PASS (4 packages) |
| CI checks | 3/3 PASS (fmt-check, vet, test) |
Previous Findings — Resolution Status
[C1 — Triple type duplication] → FIXED
New protocoltypes package is the single source of truth. Both openai_compat and anthropic packages use Go type aliases (type ToolCall = protocoltypes.ToolCall), which is the correct Go pattern — zero runtime overhead, no conversion needed, preserves each package's public API. This eliminated ~180 lines of conversion boilerplate from claude_provider.go and http_provider.go.
[C2 — Anthropic base URL hardcoded] → FIXED
NewProviderWithBaseURL()andNewProviderWithTokenSourceAndBaseURL()added to anthropic packagenormalizeBaseURL()correctly strips/v1suffix (SDK appends it internally)createClaudeAuthProvider(apiBase)now accepts and forwards the base URL- Test
TestCreateProviderReturnsClaudeProviderForAnthropicOAuthverifies the full flow:config APIBase → factory → anthropic provider → BaseURL()with/v1normalization
[H1 — Breaking API change (variadic proxy)] → FIXED
NewHTTPProvider(apiKey, apiBase, proxy string) restored to 3 positional args. No variadic. Same for openai_compat.NewProvider. API stability preserved.
[H2 — Model prefix stripping incomplete] → FIXED
New normalizeModel() function covers all routed providers: moonshot, nvidia, groq, ollama, deepseek, google, openrouter, zhipu. Plus the smart OpenRouter check preserves full model names when apiBase contains openrouter.ai. Tests verify both prefix stripping and OpenRouter passthrough.
[H3 — Missing test coverage] → LARGELY FIXED
Significant test additions:
- Factory integration tests:
CreateProviderreturns correct types for OpenRouter, CodexCLI, CodexCLIAuth, AnthropicOAuth, OpenAIOAuth resolveProviderSelectionnow has 16 test cases covering deepseek, nvidia, shengsuanyun, ollama, groq, moonshot, codex-code, copilot, error paths- Auth testability via
var getCredential = auth.GetCredentialdependency injection anthropicpackage: round-trip Chat, token source refresh, base URL normalization, stop reasonsopenai_compatpackage: tool call parsing, HTTP errors, prefix stripping, proxy config, numeric option coercion, GLM max_completion_tokens
[M1 — Identical code branches] → FIXED
Tool call parsing in openai_compat/provider.go simplified to single if tc.Function != nil check.
[M2 — Silent error swallowing] → FIXED
Both anthropic/provider.go:200 and openai_compat/provider.go:164 now log with log.Printf before falling back to {"raw": ...}.
[M3 — Silent proxy parse failure] → FIXED
openai_compat/provider.go:44 now logs log.Printf("openai_compat: invalid proxy URL..."). Appropriate for library code.
Additional Improvements (Author Initiative)
These were NOT requested but significantly improve the PR:
- [POSITIVE] Proxy config was missing from multiple explicit providers (Groq, OpenAI, OpenRouter, Zhipu, Gemini, VLLM, ShengSuanYun, DeepSeek) — now all wire through
sel.proxy. Bug prevented users from using proxies with these providers. - [POSITIVE]
nvidiaexplicit provider case added to factory switch (was only in model-inference fallback before) - [POSITIVE]
asInt()/asFloat()helpers handle numeric type coercion (float64,int,int64,float32) — JSON deserialization often yieldsfloat64where callers expectint - [POSITIVE]
var getCredential = auth.GetCredentialmakes factory testable via dependency injection — previously untestable auth paths now have full test coverage - [POSITIVE]
normalizeBaseURL()handles the config/SDK mismatch cleanly (config uses/v1, SDK adds it internally)
Architecture Assessment (Post-Fix)
The final architecture is clean:
protocoltypes/types.go ← Single source of truth for shared types
├── anthropic/provider.go ← Anthropic-native protocol (SDK-based)
├── openai_compat/provider.go ← OpenAI chat/completions protocol
providers/
├── types.go ← Type aliases + LLMProvider interface
├── factory.go ← Protocol-aware routing (pure logic + DI)
├── claude_provider.go ← Thin adapter → anthropic package
└── http_provider.go ← Thin adapter → openai_compat package
This is textbook Strategy + Factory pattern. Each protocol handler is independently testable, the routing logic is pure (no side effects), and the adapter layer is minimal (no conversion boilerplate).
Verdict: APPROVE
All critical, high, and medium findings resolved. Tests pass. Architecture is sound. The fix commit demonstrates strong attention to reviewer feedback — not just fixing what was flagged, but also finding related issues (proxy wiring, nvidia routing) and fixing those too.
Well done, @jmahotiedu.
| @@ -0,0 +1,45 @@ | |||
| package protocoltypes | |||
There was a problem hiding this comment.
[POSITIVE] Clean solution to the C1 triple-duplication finding. protocoltypes as the single source of truth with type aliases in consuming packages is the correct Go idiom — zero runtime cost, no conversions needed, and each package preserves its own public API.
This eliminated ~180 lines of field-by-field conversion code from claude_provider.go and http_provider.go.
| baseURL: defaultBaseURL, | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
[POSITIVE] Good resolution of C2 — the NewProviderWithTokenSourceAndBaseURL constructor cleanly chains through NewProviderWithBaseURL, which calls normalizeBaseURL to handle the /v1 config/SDK mismatch.
The BaseURL() getter is also a nice addition — makes the flow testable end-to-end as proven by TestCreateProviderReturnsClaudeProviderForAnthropicOAuth.
| var apiResponse struct { | ||
| Choices []struct { | ||
| Message struct { | ||
| Content string `json:"content"` |
There was a problem hiding this comment.
[POSITIVE] normalizeModel() now covers all 8 routed prefixes plus the smart OpenRouter passthrough. The separation of the stripping logic into its own function with apiBase awareness is well-designed — it correctly preserves openrouter/model-name when targeting OpenRouter's API, while stripping deepseek/deepseek-chat for DeepSeek's API.
| import ( | ||
| "fmt" | ||
| "strings" | ||
|
|
There was a problem hiding this comment.
[POSITIVE] var getCredential = auth.GetCredential is a textbook Go testability pattern. Previously createClaudeAuthProvider and createCodexAuthProvider called auth.GetCredential directly — making them untestable. Now the factory tests can inject mock credential functions. Well done.
Minor note: defaultAnthropicAPIBase as a package-level constant avoids magic strings scattered across the factory — good practice.
Merge upstream/main into refactor/provider-protocol-122. Resolve http_provider.go conflict (keep thin delegate). Wire OpenAIProviderConfig.WebSearch through providerSelection and into CodexProvider for codex-auth and codex-cli-token paths.
Resolve conflicts in pkg/providers/types.go and pkg/agent/loop.go: - types.go: use protocoltypes aliases from PR sipeed#213, keep fallback types - loop.go: drop old single-agent createToolRegistry (replaced by multi-agent pattern) Refactor to align with PR sipeed#213 patterns: - instance.go: use NewExecToolWithConfig (accept full config for deny patterns) - registry.go: pass full config to NewAgentInstance - loop.go: add Perplexity web search options to registerSharedTools
…col-122 Refactor providers by protocol family (discussion sipeed#122)
Resolve conflicts in pkg/providers/types.go and pkg/agent/loop.go: - types.go: use protocoltypes aliases from PR sipeed#213, keep fallback types - loop.go: drop old single-agent createToolRegistry (replaced by multi-agent pattern) Refactor to align with PR sipeed#213 patterns: - instance.go: use NewExecToolWithConfig (accept full config for deny patterns) - registry.go: pass full config to NewAgentInstance - loop.go: add Perplexity web search options to registerSharedTools
Summary
pkg/providers/factory.go)pkg/providers/openai_compatand keephttp_provideras a thin compatibility adapterpkg/providers/anthropicwith compatibility wrappers inclaude_provider.goVerification
go test ./pkg/providers/... ./pkg/migratego test ./...still has pre-existing failures on this Windows environment:cmd/picoclawembed path setup (pattern workspace: no matching files found)pkg/authfile permission tests on Windows temp pathsImplements: #122