Skip to content

fix: early session cleanup that broke proxies#235

Merged
x032205 merged 2 commits into
mainfrom
fix-early-cleanup
May 14, 2026
Merged

fix: early session cleanup that broke proxies#235
x032205 merged 2 commits into
mainfrom
fix-early-cleanup

Conversation

@x032205
Copy link
Copy Markdown
Member

@x032205 x032205 commented May 14, 2026

Description 📣

[](fix: early session cleanup that broke proxies)

Type ✨

  • Bug fix
  • New feature
  • Improvement
  • Breaking change
  • Documentation

Tests 🛠️

# Here's some code block to paste some code snippets

@x032205 x032205 requested a review from bernie-g May 14, 2026 02:45
@infisical-review-police
Copy link
Copy Markdown

💬 Discussion in Slack: #pr-review-cli-235-fix-early-session-cleanup-that-broke-proxies

Posted by Review Police — reviews, comments, new commits, and CI failures will stream into this channel.

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional findings (outside current diff — PR may have been updated during review):

  • 🔴 packages/gateway-v2/gateway.go:883-886 — This PR removes the only routine cleanup path for non-RDP sessions that end via normal client disconnect. The pre-PR if lastConn && !isRDP { CleanupPAMSession(...) } branch was the cleanup path for SSH/Postgres/MySQL/MSSQL/Redis when no explicit cancel happens; the replacement is a bare DeregisterPAMSession, which deletes the session from g.pamSessions once the last connection drops. Because the idle reaper iterates only g.pamSessions, the session becomes invisible to it, and the only remaining cleanup is uploadExpiredSessionFiles after cert NotAfter — which is at least multiple hours (cert renewal runs every 6 hours). Consider keeping the per-type gate (extending isRDP to also skip Mongo) or having DeregisterPAMSession leave a tombstone the reaper can finish.

    Extended reasoning...

    What changes\n\nThe deleted block ran on the last connection close for non-RDP sessions:\n\ngo\nisRDP := forwardConfig.PAMConfig.ResourceType == session.ResourceTypeWindows\nif lastConn := g.DeregisterPAMSession(...); lastConn && !isRDP {\n forwardConfig.PAMConfig.SessionUploader.CleanupPAMSession(sessionID, "connection_closed")\n}\n\n\nAfter the PR there is just g.DeregisterPAMSession(...). The intent — "don't terminate the proxy on disconnect, because RDP/Mongo reconnect inside the cert validity window" — is correct, but the change widens that to every resource type, including ones that have no reconnect semantics (SSH/Postgres/MySQL/MSSQL/Redis/Oracle/Kubernetes).\n\n## Why the idle reaper does not save us\n\nDeregisterPAMSession removes the session from the map once its last connection drops:\n\ngo\nisLast := len(g.pamSessions[sessionID]) == 0\nif isLast { delete(g.pamSessions, sessionID) }\n\n\nreapIdleSessions only iterates g.pamSessions, so a session whose last connection has been deregistered is invisible to it. lastActivity lives on the *pamSessionEntry struct, which is gone after the delete.\n\n## The remaining path is the cert-expiry sweeper, which is hours away\n\nThe one path that still runs is uploadExpiredSessionFiles (uploader.go:503). It calls GetExpiredSessionFiles, which only returns files past ExpiresAt (uploader.go:171). ExpiresAt = clientCert.NotAfter (set in parseDetailsFromCertificate). Certificate renewal runs every 6 hours (startCertificateRenewal), so cert lifetime is at least multiple hours.\n\nDuring that window for a normally-ended non-RDP session:\n- Backend believes the session is still active. CallPAMSessionTermination is reachable only through CleanupPAMSession (uploader.go:822).\n- Recording file remains on disk until now.After(ExpiresAt).\n- Credentials linger in CredentialsManager. CleanupSessionCredentials is only called from CleanupPAMSession (credentials.go:248 comment confirms).\n- MongoDB proxy entries leak entirely. closeMongoProxy is only invoked from CancelPAMSession (gateway.go:260). It is not called from uploadExpiredSessionFiles (which lives in packages/pam/session/uploader.go and has no g.mongoProxies reference). So a Mongo session that disconnects normally keeps its topology open until the gateway shuts down or someone explicitly cancels it.\n\n## Step-by-step proof (SSH session)\n\n1. User opens an SSH PAM session, certificate NotAfter = now + 4h.\n2. User finishes work, closes the SSH client. HandlePAMProxy returns at gateway.go:881.\n3. sessionCancel() runs at gateway.go:885.\n4. g.DeregisterPAMSession(sessionID, tlsConn) runs at gateway.go:886. entries becomes empty → delete(g.pamSessions, sessionID).\n5. At t+1m: reapIdleSessions ticks. Loop body for sessionID, entries := range g.pamSessions skips this session — it is no longer in the map.\n6. At t+5m, t+10m, …: uploadExpiredSessionFiles ticks. now.After(file.ExpiresAt) is false because ExpiresAt = now + 4h. Nothing happens.\n7. At t+4h: now.After(file.ExpiresAt) is finally true. CleanupPAMSession(sessionID, "orphaned_file") runs — final flush, file delete, credentials cleanup, backend notified.\n\nNet result for the common case: backend sees the session as active for ~4 hours after the user actually disconnected, the recording sits on disk that whole time, and any Mongo proxy is never cleaned up.\n\n## Response to the "this is intentional" objection\n\nThe PR title is "fix: early session cleanup that broke proxies", and the pre-PR comment that was removed said "RDP reconnects via a stable .rdp file within the session's validity window". That rationale is type-specific. RDP and Mongo (GetOrCreateMongoProxy comment: "so that subsequent client connections find a warm topology") need the cert-validity reconnect window. SSH/Postgres/MySQL/MSSQL/Redis do not — once the client process exits, the TLS connection is gone and there is no reconnect protocol that benefits from holding state. For those types, immediate cleanup is correct and pre-PR was right.\n\nThe fix that actually achieves the PR's stated goal without the regression is the narrower one the previous code almost had: extend the existing isRDP gate to also skip cleanup when ResourceType == ResourceTypeMongoDB, rather than removing the gate entirely. Alternatively, have DeregisterPAMSession leave the entry in the map with a "disconnected-since" timestamp so reapIdleSessions can pick it up after a short grace period.\n\n## Severity\n\nNormal — affects the common case (any non-RDP, non-Mongo session ended by client disconnect) and is observable as: server-side session-active leak for hours, delayed recording upload, leaked credentials in the gateway, and orphaned Mongo topologies (separate bug for Mongo specifically).

@x032205 x032205 merged commit 6a698a6 into main May 14, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants