fix(pam): keep session recording on upload failure (PAM-205)#199
Merged
Conversation
CleanupPAMSession previously deleted the local recording file and notified the platform of session termination even when the legacy bulk upload (or final batch flush, or encryption-key fetch) failed, silently losing the entire session. Now any upload-side failure returns early with the file, registry entry, and persisted offset all intact, so uploadExpiredSessionFiles can retry once ExpiresAt crosses. flushSession returns its error so CleanupPAMSession can observe the failure; flushActiveSessions discards it since the 10s ticker retries on its own cycle. Also changes resumeInProgressSessions to call CleanupPAMSession instead of RegisterSession for each leftover file at startup. A gateway restart kills every proxy connection, so any file on disk is from a session that is over from the customer's perspective; driving final cleanup is correct and turns gateway restart into a real retry path for stuck legacy uploads.
|
💬 Discussion in Slack: #pr-review-cli-199-fix-pam-keep-session-recording-on-upload-failure-pam-205 Posted by Review Police — reviews, comments, new commits, and CI failures will stream into this channel. |
Address review feedback on resumeInProgressSessions: - Skip already-expired files at startup; they are handled exclusively by uploadExpiredSessionFiles which fires immediately afterward. Prevents duplicate back-to-back cleanup attempts on the same file when the platform endpoint is flaky. - Soften the failure log to "Startup cleanup did not complete successfully" since CleanupPAMSession can also fail after the recording file has already been deleted (termination-notify error path), in which case "file retained for retry" was inaccurate. - Update the stale inline comment in startUploadRoutine to reflect the new function behavior.
sheensantoscapadngan
approved these changes
Apr 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description 📣
Failed PAM session log uploads were silently destroying the local recording.
CleanupPAMSessiondeleted the file and notified the platform of session termination even when the legacy bulk upload (or final batch flush, or encryption-key fetch) errored.This PR makes any upload-side failure return early with the file, registry entry, and persisted offset all intact.
uploadExpiredSessionFilesretries onceExpiresAtcrosses.resumeInProgressSessionsnow drivesCleanupPAMSessionfor every leftover file at startup instead of just re-registering. A gateway restart kills all proxy connections, so leftover files are from sessions that are already over; driving final cleanup turns restart into a real retry path for stuck legacy uploads.Type ✨
Tests 🛠️
Manually reproduced against a local platform with a temporary 500 injected on
/sessions/:id/logsand 404 on/sessions/:id/event-batches:Legacy bulk upload failed at session end, keeping recording file for retry. After unsetting the 500 and restarting the gateway,resumeInProgressSessionsdrives cleanup to completion: file uploaded, deleted, termination notified.