fix: early session cleanup that broke proxies#235
Conversation
|
💬 Discussion in Slack: #pr-review-cli-235-fix-early-session-cleanup-that-broke-proxies Posted by Review Police — reviews, comments, new commits, and CI failures will stream into this channel. |
There was a problem hiding this comment.
Additional findings (outside current diff — PR may have been updated during review):
-
🔴
packages/gateway-v2/gateway.go:883-886— This PR removes the only routine cleanup path for non-RDP sessions that end via normal client disconnect. The pre-PRif lastConn && !isRDP { CleanupPAMSession(...) }branch was the cleanup path for SSH/Postgres/MySQL/MSSQL/Redis when no explicit cancel happens; the replacement is a bareDeregisterPAMSession, which deletes the session fromg.pamSessionsonce the last connection drops. Because the idle reaper iterates onlyg.pamSessions, the session becomes invisible to it, and the only remaining cleanup isuploadExpiredSessionFilesafter certNotAfter— which is at least multiple hours (cert renewal runs every 6 hours). Consider keeping the per-type gate (extendingisRDPto also skip Mongo) or havingDeregisterPAMSessionleave a tombstone the reaper can finish.Extended reasoning...
What changes\n\nThe deleted block ran on the last connection close for non-RDP sessions:\n\n
go\nisRDP := forwardConfig.PAMConfig.ResourceType == session.ResourceTypeWindows\nif lastConn := g.DeregisterPAMSession(...); lastConn && !isRDP {\n forwardConfig.PAMConfig.SessionUploader.CleanupPAMSession(sessionID, "connection_closed")\n}\n\n\nAfter the PR there is justg.DeregisterPAMSession(...). The intent — "don't terminate the proxy on disconnect, because RDP/Mongo reconnect inside the cert validity window" — is correct, but the change widens that to every resource type, including ones that have no reconnect semantics (SSH/Postgres/MySQL/MSSQL/Redis/Oracle/Kubernetes).\n\n## Why the idle reaper does not save us\n\nDeregisterPAMSessionremoves the session from the map once its last connection drops:\n\ngo\nisLast := len(g.pamSessions[sessionID]) == 0\nif isLast { delete(g.pamSessions, sessionID) }\n\n\nreapIdleSessionsonly iteratesg.pamSessions, so a session whose last connection has been deregistered is invisible to it.lastActivitylives on the*pamSessionEntrystruct, which is gone after the delete.\n\n## The remaining path is the cert-expiry sweeper, which is hours away\n\nThe one path that still runs isuploadExpiredSessionFiles(uploader.go:503). It callsGetExpiredSessionFiles, which only returns files pastExpiresAt(uploader.go:171).ExpiresAt=clientCert.NotAfter(set inparseDetailsFromCertificate). Certificate renewal runs every 6 hours (startCertificateRenewal), so cert lifetime is at least multiple hours.\n\nDuring that window for a normally-ended non-RDP session:\n- Backend believes the session is still active.CallPAMSessionTerminationis reachable only throughCleanupPAMSession(uploader.go:822).\n- Recording file remains on disk untilnow.After(ExpiresAt).\n- Credentials linger in CredentialsManager.CleanupSessionCredentialsis only called fromCleanupPAMSession(credentials.go:248 comment confirms).\n- MongoDB proxy entries leak entirely.closeMongoProxyis only invoked fromCancelPAMSession(gateway.go:260). It is not called fromuploadExpiredSessionFiles(which lives inpackages/pam/session/uploader.goand has nog.mongoProxiesreference). So a Mongo session that disconnects normally keeps its topology open until the gateway shuts down or someone explicitly cancels it.\n\n## Step-by-step proof (SSH session)\n\n1. User opens an SSH PAM session, certificateNotAfter= now + 4h.\n2. User finishes work, closes the SSH client.HandlePAMProxyreturns at gateway.go:881.\n3.sessionCancel()runs at gateway.go:885.\n4.g.DeregisterPAMSession(sessionID, tlsConn)runs at gateway.go:886.entriesbecomes empty →delete(g.pamSessions, sessionID).\n5. At t+1m:reapIdleSessionsticks. Loop bodyfor sessionID, entries := range g.pamSessionsskips this session — it is no longer in the map.\n6. At t+5m, t+10m, …:uploadExpiredSessionFilesticks.now.After(file.ExpiresAt)is false becauseExpiresAt= now + 4h. Nothing happens.\n7. At t+4h:now.After(file.ExpiresAt)is finally true.CleanupPAMSession(sessionID, "orphaned_file")runs — final flush, file delete, credentials cleanup, backend notified.\n\nNet result for the common case: backend sees the session as active for ~4 hours after the user actually disconnected, the recording sits on disk that whole time, and any Mongo proxy is never cleaned up.\n\n## Response to the "this is intentional" objection\n\nThe PR title is "fix: early session cleanup that broke proxies", and the pre-PR comment that was removed said "RDP reconnects via a stable .rdp file within the session's validity window". That rationale is type-specific. RDP and Mongo (GetOrCreateMongoProxy comment: "so that subsequent client connections find a warm topology") need the cert-validity reconnect window. SSH/Postgres/MySQL/MSSQL/Redis do not — once the client process exits, the TLS connection is gone and there is no reconnect protocol that benefits from holding state. For those types, immediate cleanup is correct and pre-PR was right.\n\nThe fix that actually achieves the PR's stated goal without the regression is the narrower one the previous code almost had: extend the existingisRDPgate to also skip cleanup whenResourceType == ResourceTypeMongoDB, rather than removing the gate entirely. Alternatively, haveDeregisterPAMSessionleave the entry in the map with a "disconnected-since" timestamp soreapIdleSessionscan pick it up after a short grace period.\n\n## Severity\n\nNormal — affects the common case (any non-RDP, non-Mongo session ended by client disconnect) and is observable as: server-side session-active leak for hours, delayed recording upload, leaked credentials in the gateway, and orphaned Mongo topologies (separate bug for Mongo specifically).
Description 📣
[](fix: early session cleanup that broke proxies)
Type ✨
Tests 🛠️
# Here's some code block to paste some code snippets