feat(auth): add credential-master mode for follower nodes#1258
feat(auth): add credential-master mode for follower nodes#1258notdp wants to merge 12 commits intorouter-for-me:mainfrom
Conversation
Summary of ChangesHello @notdp, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request implements a robust master-follower architecture for OAuth credential management in distributed environments. It enables follower nodes to centralize their credential handling by fetching access tokens and synchronizing authentication files from a designated master node. This design aims to improve consistency, reduce operational overhead, and enhance resilience against credential expiration across multiple service instances sharing the same OAuth account. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new master-follower mode for credential synchronization, enabling multi-node deployments to share OAuth accounts. Key features include follower nodes fetching access tokens from the master on 401/403 errors, syncing all auth files from the master on startup and frontend refresh, and preventing local token refreshes on follower nodes. The changes involve adding new configuration options, internal API endpoints on the master node, and logic within the auth manager to handle credential fetching and synchronization. The implementation appears robust, addressing several aspects of distributed credential management.
internal/api/server.go
Outdated
| internal := s.engine.Group("/v0/internal", s.peerAuthMiddleware()) | ||
| internal.GET("/credential", s.handleCredentialQuery) | ||
| internal.GET("/auth-list", s.handleAuthList) |
There was a problem hiding this comment.
The internal API endpoints /v0/internal/credential and /v0/internal/auth-list are exposed under the /v0/internal group. It's good that they are protected by s.peerAuthMiddleware(). Ensure that this middleware is sufficiently robust to prevent unauthorized access, especially since these endpoints expose sensitive credential information (access tokens).
internal/api/server.go
Outdated
| if s.peerSecret == "" { | ||
| c.AbortWithStatusJSON(http.StatusUnauthorized, gin.H{"error": "peer authentication not configured"}) | ||
| return |
internal/api/server.go
Outdated
| if s == nil || s.handlers == nil || s.handlers.AuthManager == nil { | ||
| c.JSON(http.StatusInternalServerError, gin.H{"error": "server not initialized"}) | ||
| return |
There was a problem hiding this comment.
The handleCredentialQuery and handleAuthList functions perform a nil check for s, s.handlers, and s.handlers.AuthManager. While these checks are defensive, if the server is not initialized, it indicates a critical startup failure. Consider logging these cases as errors rather than just returning a generic 500 status, to aid in debugging.
sdk/cliproxy/auth/conductor.go
Outdated
| secret := m.GetPeerSecret() | ||
| if secret == "" { | ||
| log.Warnf("SyncAuthsFromMaster: peer secret not configured") | ||
| return errors.New("peer secret not configured") |
sdk/cliproxy/auth/conductor.go
Outdated
| } | ||
| req.Header.Set("X-Peer-Secret", secret) | ||
|
|
||
| client := &http.Client{Timeout: 30 * time.Second} |
sdk/cliproxy/auth/conductor.go
Outdated
| return err | ||
| } | ||
|
|
||
| return os.WriteFile(filepath, data, 0600) |
There was a problem hiding this comment.
sdk/cliproxy/auth/conductor.go
Outdated
| go func() { | ||
| if err := m.fetchCredentialFromMaster(context.Background(), result.AuthID, result.Provider); err != nil { | ||
| log.Debugf("failed to fetch credential from master: %v", err) | ||
| } |
There was a problem hiding this comment.
The fetchCredentialFromMaster call is executed in a new goroutine. While this prevents blocking the MarkResult function, the context.Background() is used. It would be better to pass the original ctx from MarkResult or a derived context with a timeout, to ensure that the fetch operation respects the original request's lifecycle and doesn't run indefinitely if the main context is canceled.
| if s.coreManager != nil { | ||
| if err := s.coreManager.SyncAuthsFromMaster(ctx, s.cfg.AuthDir); err != nil { | ||
| log.Warnf("failed to sync auths from master: %v", err) |
There was a problem hiding this comment.
使用场景问题:三台机器(公网服务器、Mac Mini、本机)共享同一个 Claude OAuth 账号。任意一台刷新 token 后,其他机器的 token 失效,请求报错。 解决方案:主从模式 flowchart TB
subgraph Master["🖥️ 主节点 (公网服务器)"]
M[持有 refresh_token<br/>负责刷新]
end
subgraph Followers["📱 从节点"]
F1[Mac Mini]
F2[本机]
end
M -- "启动时同步 auth" --> F1
M -- "启动时同步 auth" --> F2
F1 -. "401/403 时获取 token" .-> M
F2 -. "401/403 时获取 token" .-> M
从节点配置: credential-master: "http://公网服务器:8888"
remote-management:
secret-key: "共享密钥"效果:
|
|
@luispater 辛苦review一下哦 |
3fb2341 to
97a009b
Compare
测试结果环境
测试用例1. 启动时自动同步# 删除本地 auth 文件后重启
$ rm ~/.cli-proxy-api/claude-*.json
$ launchctl bootstrap ...
# 验证:启动后自动从 master 同步
$ curl /v0/management/auth-files | jq '[.files[].provider] | unique'
["claude", "codex"] ✅2. 前端刷新时同步# 删除本地 auth 文件
$ rm ~/.cli-proxy-api/claude-*.json
# 调用 auth-files 接口(模拟前端刷新)
$ curl /v0/management/auth-files | jq '[.files[].provider] | unique'
["claude", "codex"] ✅ 立即恢复,无需重启3. 同步后请求测试(连续 3 次)$ curl /v1/chat/completions -d '{"model":"claude-sonnet-4-20250514",...}'
测试 1: {"content":"ok"} ✅
测试 2: {"content":"ok"} ✅
测试 3: {"content":"ok"} ✅4. 401/403 时自动刷新 token从节点收到 401/403 错误时,自动从 master 获取最新 access_token 并清除 suspend 状态。 结论所有功能正常 ✅ |
bd40a81 to
ccd6317
Compare
4aa2124 to
86e6584
Compare
Master-follower credential synchronization for distributed deployments: - Add credential-master config option to specify master node URL - Follower nodes fetch access_token from master on startup (SyncAuthsFromMaster) - On 401 errors, follower fetches fresh token from master and retries - Master performs on-demand token refresh when serving credential queries - Sync expired time from master to prevent repeated fetch cycles - Add internal API endpoints: /v0/internal/credential, /v0/internal/auth-list - Add ResumeClientAllModels to clear suspensions after credential refresh Note: 403 errors do not trigger fetch (may indicate TLS fingerprint issues)
86e6584 to
cc6de0d
Compare
Move AuthSyncData type, master/follower sync functions, and credential fetch logic from conductor.go (~320 lines) into credential_master.go. conductor.go retains only the Manager struct fields, the 401-retry calls in executeMixedOnce/executeStreamMixedOnce, and the follower branch in refreshAuth. Also changes follower HTTP auth from X-Peer-Secret header to standard Authorization: Bearer header, and makes syncDataToAuth/writeAuthToFile package-level functions (no receiver needed).
…auth Migrate handleCredentialQuery and handleAuthList from server.go to management/credential_sync.go. Replace the simplistic peerAuthMiddleware (plaintext X-Peer-Secret comparison) with the existing management Middleware() that provides bcrypt hash verification, IP ban after 5 failures, and standard Authorization: Bearer header support. Remove the peerSecret field from Server struct — auth is now handled entirely by the management middleware layer.
Cover sanitizeMetadataForSync, syncDataToAuth, writeAuthToFile, fetchCredentialFromMaster (with httptest mock), GetAllAuthsForSync, and SyncAuthsFromMaster. 13 test cases total.
…ment 403 may indicate TLS fingerprint mismatch rather than invalid token, so it should not trigger the same 30-minute suspend as 401.
…leware Management Middleware uses bcrypt verification (expects plaintext password vs stored hash), which is incompatible with peer auth where both master and follower share the same raw config value. Restore constant-time plaintext comparison via PeerAuthMiddleware in the management package. Revert client-side header from Authorization: Bearer back to X-Peer-Secret.
…iddleware Follower sends plaintext secret via Authorization: Bearer, master's management Middleware handles bcrypt verification + rate limiting.
…thorized Deduplicate the identical 401 retry code in executeMixedOnce and executeStreamMixedOnce by extracting it into a helper function.
Read CredentialMaster and RemoteManagement.SecretKey directly from the runtimeConfig instead of storing duplicate copies. Remove SetCredentialMaster and SetPeerSecret methods; keep GetCredentialMaster public for external package access (auth_files.go needs it).
Use constant-time hash comparison for peer auth (both master and follower share the same secret-key hash). Management Middleware (bcrypt) remains for /v0/management routes where human users enter plaintext passwords.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8dcad27af8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for _, syncData := range result.Auths { | ||
| auth := syncDataToAuth(syncData, authDir) | ||
| m.mu.Lock() | ||
| m.auths[auth.ID] = auth | ||
| m.mu.Unlock() |
There was a problem hiding this comment.
Remove stale auths when syncing from master
SyncAuthsFromMaster only upserts the auths returned by the master but never deletes local auths that are missing from the response. Because the master’s GetAllAuthsForSync skips disabled auths, any credential that gets disabled or removed on the master will remain active on the follower and continue to be used. This breaks the “master as source of truth” model and can keep revoked credentials in rotation; consider clearing/garbage-collecting local auths (and files) that are not present in the master’s list before re-adding.
Useful? React with 👍 / 👎.
|
关闭此 PR。auth 文件分发用 GitStore 已有方案,Claude 用 setup-token(1年有效)解决刷新问题,无需自定义协调协议。 |
|
重新打开。GitStore 没有定期 pull 机制,启动后不会同步更新,不适合多实例部署。credential-master 方案仍然需要。 |
Summary
Add master-follower credential synchronization for multi-node deployments sharing the same OAuth account.
Features
Configuration
New Internal APIs (master node)
GET /v0/internal/credential?id=xxx- Get access_token for a specific authGET /v0/internal/auth-list- Get all auths (without refresh_token)How it works