Skip to content

feat(auth): add credential-master mode for follower nodes#1258

Closed
notdp wants to merge 12 commits intorouter-for-me:mainfrom
notdp:feature/credential-master
Closed

feat(auth): add credential-master mode for follower nodes#1258
notdp wants to merge 12 commits intorouter-for-me:mainfrom
notdp:feature/credential-master

Conversation

@notdp
Copy link

@notdp notdp commented Jan 27, 2026

Summary

Add master-follower credential synchronization for multi-node deployments sharing the same OAuth account.

Features

  • Follower nodes fetch access_token from master on 401/403 errors
  • Sync all auth files from master on startup and frontend refresh
  • Follower nodes never refresh locally (no refresh_token stored)

Configuration

credential-master: "http://master-node:8888"
remote-management:
  secret-key: "shared-secret"

New Internal APIs (master node)

  • GET /v0/internal/credential?id=xxx - Get access_token for a specific auth
  • GET /v0/internal/auth-list - Get all auths (without refresh_token)

How it works

  1. Startup: Follower syncs all auth files from master
  2. 401/403 error: Follower fetches fresh access_token from master
  3. Frontend refresh: Follower syncs auth files from master

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @notdp, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements a robust master-follower architecture for OAuth credential management in distributed environments. It enables follower nodes to centralize their credential handling by fetching access tokens and synchronizing authentication files from a designated master node. This design aims to improve consistency, reduce operational overhead, and enhance resilience against credential expiration across multiple service instances sharing the same OAuth account.

Highlights

  • Master-Follower Credential Synchronization: Introduces a new mode for multi-node deployments where follower nodes can synchronize OAuth credentials from a designated master node, ensuring consistent access tokens across the cluster.
  • Automated Access Token Fetching: Follower nodes are now configured to automatically fetch fresh access tokens from the master node upon encountering 401 (Unauthorized) or 403 (Forbidden) errors, minimizing service disruption due to expired credentials.
  • Initial and On-Demand Auth File Sync: Follower nodes will sync all authentication files from the master node during startup and also when a frontend refresh is triggered, ensuring their local credential store is up-to-date.
  • Dedicated Internal APIs for Credential Management: New internal API endpoints (/v0/internal/credential and /v0/internal/auth-list) have been added to the master node, allowing follower nodes to securely query for individual access tokens or a list of all available authentications.
  • Prevention of Local Refresh on Follower Nodes: Follower nodes, when configured with a credential master, will no longer attempt to refresh credentials locally, relying entirely on the master for token updates and avoiding the storage of refresh tokens.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new master-follower mode for credential synchronization, enabling multi-node deployments to share OAuth accounts. Key features include follower nodes fetching access tokens from the master on 401/403 errors, syncing all auth files from the master on startup and frontend refresh, and preventing local token refreshes on follower nodes. The changes involve adding new configuration options, internal API endpoints on the master node, and logic within the auth manager to handle credential fetching and synchronization. The implementation appears robust, addressing several aspects of distributed credential management.

Comment on lines 434 to 436
internal := s.engine.Group("/v0/internal", s.peerAuthMiddleware())
internal.GET("/credential", s.handleCredentialQuery)
internal.GET("/auth-list", s.handleAuthList)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The internal API endpoints /v0/internal/credential and /v0/internal/auth-list are exposed under the /v0/internal group. It's good that they are protected by s.peerAuthMiddleware(). Ensure that this middleware is sufficiently robust to prevent unauthorized access, especially since these endpoints expose sensitive credential information (access tokens).

Comment on lines 1095 to 1097
if s.peerSecret == "" {
c.AbortWithStatusJSON(http.StatusUnauthorized, gin.H{"error": "peer authentication not configured"})
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In peerAuthMiddleware, if s.peerSecret is empty, an http.StatusUnauthorized is returned. This is correct, but it might be useful to log a warning or error when the peerSecret is not configured, as this could indicate a misconfiguration in a multi-node setup.

Comment on lines 1115 to 1117
if s == nil || s.handlers == nil || s.handlers.AuthManager == nil {
c.JSON(http.StatusInternalServerError, gin.H{"error": "server not initialized"})
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The handleCredentialQuery and handleAuthList functions perform a nil check for s, s.handlers, and s.handlers.AuthManager. While these checks are defensive, if the server is not initialized, it indicates a critical startup failure. Consider logging these cases as errors rather than just returning a generic 500 status, to aid in debugging.

Comment on lines 396 to 399
secret := m.GetPeerSecret()
if secret == "" {
log.Warnf("SyncAuthsFromMaster: peer secret not configured")
return errors.New("peer secret not configured")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to fetchCredentialFromMaster, SyncAuthsFromMaster also returns errors.New directly when peerSecret is not configured. Wrapping this error with more context would be helpful for debugging.

}
req.Header.Set("X-Peer-Secret", secret)

client := &http.Client{Timeout: 30 * time.Second}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The http.Client in SyncAuthsFromMaster has a hardcoded Timeout of 30 seconds. This, too, should ideally be configurable, as syncing multiple auth files could take longer or shorter depending on the number of files and network latency.

return err
}

return os.WriteFile(filepath, data, 0600)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In writeAuthToFile, the file permission is set to 0600. This is good for security as it restricts access to the owner. However, ensure that the directory permissions for authDir are also appropriately set to prevent unauthorized access to the credential files.

Comment on lines 1507 to 1510
go func() {
if err := m.fetchCredentialFromMaster(context.Background(), result.AuthID, result.Provider); err != nil {
log.Debugf("failed to fetch credential from master: %v", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The fetchCredentialFromMaster call is executed in a new goroutine. While this prevents blocking the MarkResult function, the context.Background() is used. It would be better to pass the original ctx from MarkResult or a derived context with a timeout, to ensure that the fetch operation respects the original request's lifecycle and doesn't run indefinitely if the main context is canceled.

Comment on lines +578 to +580
if s.coreManager != nil {
if err := s.coreManager.SyncAuthsFromMaster(ctx, s.cfg.AuthDir); err != nil {
log.Warnf("failed to sync auths from master: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The SyncAuthsFromMaster call on startup logs a warning if it fails. While this is helpful, depending on the severity of not syncing credentials at startup, this might warrant a more critical log level or even a graceful shutdown if the system cannot operate correctly without this initial sync.

@notdp
Copy link
Author

notdp commented Jan 27, 2026

使用场景

问题:三台机器(公网服务器、Mac Mini、本机)共享同一个 Claude OAuth 账号。任意一台刷新 token 后,其他机器的 token 失效,请求报错。
(优先使用localhost,本质是因为 公网服务器上的访问太慢了,测了速度,localhost proxy可以做到和cc一样的速度
服务器上的会慢一倍)

解决方案:主从模式

flowchart TB
    subgraph Master["🖥️ 主节点 (公网服务器)"]
        M[持有 refresh_token<br/>负责刷新]
    end
    
    subgraph Followers["📱 从节点"]
        F1[Mac Mini]
        F2[本机]
    end
    
    M -- "启动时同步 auth" --> F1
    M -- "启动时同步 auth" --> F2
    F1 -. "401/403 时获取 token" .-> M
    F2 -. "401/403 时获取 token" .-> M
Loading

从节点配置

credential-master: "http://公网服务器:8888"
remote-management:
  secret-key: "共享密钥"

效果

  • 启动时自动从主节点同步 auth
  • token 过期时自动从主节点获取新 token
  • 前端点刷新时自动同步
  • 从节点永不本地刷新,避免冲突

@notdp
Copy link
Author

notdp commented Jan 27, 2026

@luispater 辛苦review一下哦
我自己已经用上这一套了 本地确实比远程快了非常多

@notdp notdp force-pushed the feature/credential-master branch 2 times, most recently from 3fb2341 to 97a009b Compare January 27, 2026 15:15
@notdp
Copy link
Author

notdp commented Jan 27, 2026

测试结果

环境

  • 主节点:公网服务器(持有 refresh_token)
  • 从节点:本机 macOS(仅 access_token)

测试用例

1. 启动时自动同步

# 删除本地 auth 文件后重启
$ rm ~/.cli-proxy-api/claude-*.json
$ launchctl bootstrap ...

# 验证:启动后自动从 master 同步
$ curl /v0/management/auth-files | jq '[.files[].provider] | unique'
["claude", "codex"]  ✅

2. 前端刷新时同步

# 删除本地 auth 文件
$ rm ~/.cli-proxy-api/claude-*.json

# 调用 auth-files 接口(模拟前端刷新)
$ curl /v0/management/auth-files | jq '[.files[].provider] | unique'
["claude", "codex"]  ✅ 立即恢复,无需重启

3. 同步后请求测试(连续 3 次)

$ curl /v1/chat/completions -d '{"model":"claude-sonnet-4-20250514",...}'
测试 1: {"content":"ok"} ✅
测试 2: {"content":"ok"} ✅
测试 3: {"content":"ok"} ✅

4. 401/403 时自动刷新 token

从节点收到 401/403 错误时,自动从 master 获取最新 access_token 并清除 suspend 状态。

结论

所有功能正常 ✅

@notdp notdp force-pushed the feature/credential-master branch 2 times, most recently from bd40a81 to ccd6317 Compare January 27, 2026 15:41
@notdp notdp marked this pull request as draft January 29, 2026 06:40
@notdp notdp force-pushed the feature/credential-master branch from 4aa2124 to 86e6584 Compare January 30, 2026 09:04
Master-follower credential synchronization for distributed deployments:

- Add credential-master config option to specify master node URL
- Follower nodes fetch access_token from master on startup (SyncAuthsFromMaster)
- On 401 errors, follower fetches fresh token from master and retries
- Master performs on-demand token refresh when serving credential queries
- Sync expired time from master to prevent repeated fetch cycles
- Add internal API endpoints: /v0/internal/credential, /v0/internal/auth-list
- Add ResumeClientAllModels to clear suspensions after credential refresh

Note: 403 errors do not trigger fetch (may indicate TLS fingerprint issues)
@notdp notdp force-pushed the feature/credential-master branch from 86e6584 to cc6de0d Compare January 30, 2026 12:44
notdp added 11 commits February 4, 2026 16:53
Move AuthSyncData type, master/follower sync functions, and credential
fetch logic from conductor.go (~320 lines) into credential_master.go.

conductor.go retains only the Manager struct fields, the 401-retry calls
in executeMixedOnce/executeStreamMixedOnce, and the follower branch in
refreshAuth.

Also changes follower HTTP auth from X-Peer-Secret header to standard
Authorization: Bearer header, and makes syncDataToAuth/writeAuthToFile
package-level functions (no receiver needed).
…auth

Migrate handleCredentialQuery and handleAuthList from server.go to
management/credential_sync.go. Replace the simplistic peerAuthMiddleware
(plaintext X-Peer-Secret comparison) with the existing management
Middleware() that provides bcrypt hash verification, IP ban after 5
failures, and standard Authorization: Bearer header support.

Remove the peerSecret field from Server struct — auth is now handled
entirely by the management middleware layer.
Cover sanitizeMetadataForSync, syncDataToAuth, writeAuthToFile,
fetchCredentialFromMaster (with httptest mock), GetAllAuthsForSync,
and SyncAuthsFromMaster. 13 test cases total.
…ment

403 may indicate TLS fingerprint mismatch rather than invalid token,
so it should not trigger the same 30-minute suspend as 401.
…leware

Management Middleware uses bcrypt verification (expects plaintext password
vs stored hash), which is incompatible with peer auth where both master
and follower share the same raw config value. Restore constant-time
plaintext comparison via PeerAuthMiddleware in the management package.

Revert client-side header from Authorization: Bearer back to X-Peer-Secret.
…iddleware

Follower sends plaintext secret via Authorization: Bearer, master's
management Middleware handles bcrypt verification + rate limiting.
…thorized

Deduplicate the identical 401 retry code in executeMixedOnce and
executeStreamMixedOnce by extracting it into a helper function.
Read CredentialMaster and RemoteManagement.SecretKey directly from
the runtimeConfig instead of storing duplicate copies. Remove
SetCredentialMaster and SetPeerSecret methods; keep GetCredentialMaster
public for external package access (auth_files.go needs it).
Use constant-time hash comparison for peer auth (both master and follower
share the same secret-key hash). Management Middleware (bcrypt) remains
for /v0/management routes where human users enter plaintext passwords.
@notdp notdp marked this pull request as ready for review February 4, 2026 10:25
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8dcad27af8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +275 to +279
for _, syncData := range result.Auths {
auth := syncDataToAuth(syncData, authDir)
m.mu.Lock()
m.auths[auth.ID] = auth
m.mu.Unlock()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remove stale auths when syncing from master

SyncAuthsFromMaster only upserts the auths returned by the master but never deletes local auths that are missing from the response. Because the master’s GetAllAuthsForSync skips disabled auths, any credential that gets disabled or removed on the master will remain active on the follower and continue to be used. This breaks the “master as source of truth” model and can keep revoked credentials in rotation; consider clearing/garbage-collecting local auths (and files) that are not present in the master’s list before re-adding.

Useful? React with 👍 / 👎.

@notdp
Copy link
Author

notdp commented Feb 4, 2026

关闭此 PR。auth 文件分发用 GitStore 已有方案,Claude 用 setup-token(1年有效)解决刷新问题,无需自定义协调协议。

@notdp notdp closed this Feb 4, 2026
@notdp
Copy link
Author

notdp commented Feb 4, 2026

重新打开。GitStore 没有定期 pull 机制,启动后不会同步更新,不适合多实例部署。credential-master 方案仍然需要。

@notdp notdp reopened this Feb 4, 2026
@notdp notdp closed this Feb 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant