Skip to content

fix(pam): keep session recording on upload failure (PAM-205)#199

Merged
bernie-g merged 2 commits into
mainfrom
fix/pam-205-keep-recording-on-upload-failure
Apr 29, 2026
Merged

fix(pam): keep session recording on upload failure (PAM-205)#199
bernie-g merged 2 commits into
mainfrom
fix/pam-205-keep-recording-on-upload-failure

Conversation

@bernie-g
Copy link
Copy Markdown
Contributor

Description 📣

Failed PAM session log uploads were silently destroying the local recording. CleanupPAMSession deleted the file and notified the platform of session termination even when the legacy bulk upload (or final batch flush, or encryption-key fetch) errored.

This PR makes any upload-side failure return early with the file, registry entry, and persisted offset all intact. uploadExpiredSessionFiles retries once ExpiresAt crosses.

resumeInProgressSessions now drives CleanupPAMSession for every leftover file at startup instead of just re-registering. A gateway restart kills all proxy connections, so leftover files are from sessions that are already over; driving final cleanup turns restart into a real retry path for stuck legacy uploads.

Type ✨

  • Bug fix

Tests 🛠️

Manually reproduced against a local platform with a temporary 500 injected on /sessions/:id/logs and 404 on /sessions/:id/event-batches:

  • Pre-fix: recording deleted, session marked terminated despite upload error.
  • Post-fix: recording retained, log shows Legacy bulk upload failed at session end, keeping recording file for retry. After unsetting the 500 and restarting the gateway, resumeInProgressSessions drives cleanup to completion: file uploaded, deleted, termination notified.

CleanupPAMSession previously deleted the local recording file and
notified the platform of session termination even when the legacy bulk
upload (or final batch flush, or encryption-key fetch) failed,
silently losing the entire session.

Now any upload-side failure returns early with the file, registry
entry, and persisted offset all intact, so uploadExpiredSessionFiles
can retry once ExpiresAt crosses. flushSession returns its error so
CleanupPAMSession can observe the failure; flushActiveSessions
discards it since the 10s ticker retries on its own cycle.

Also changes resumeInProgressSessions to call CleanupPAMSession
instead of RegisterSession for each leftover file at startup. A
gateway restart kills every proxy connection, so any file on disk is
from a session that is over from the customer's perspective; driving
final cleanup is correct and turns gateway restart into a real retry
path for stuck legacy uploads.
@linear
Copy link
Copy Markdown

linear Bot commented Apr 28, 2026

@infisical-review-police
Copy link
Copy Markdown

💬 Discussion in Slack: #pr-review-cli-199-fix-pam-keep-session-recording-on-upload-failure-pam-205

Posted by Review Police — reviews, comments, new commits, and CI failures will stream into this channel.

@bernie-g bernie-g marked this pull request as ready for review April 28, 2026 21:14
Comment thread packages/pam/session/uploader.go
Comment thread packages/pam/session/uploader.go
Address review feedback on resumeInProgressSessions:

- Skip already-expired files at startup; they are handled exclusively
  by uploadExpiredSessionFiles which fires immediately afterward.
  Prevents duplicate back-to-back cleanup attempts on the same file
  when the platform endpoint is flaky.
- Soften the failure log to "Startup cleanup did not complete
  successfully" since CleanupPAMSession can also fail after the
  recording file has already been deleted (termination-notify error
  path), in which case "file retained for retry" was inaccurate.
- Update the stale inline comment in startUploadRoutine to reflect
  the new function behavior.
@bernie-g bernie-g merged commit f501d51 into main Apr 29, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants