importer: replace pg_dumpall with per-database pg_dump, add AWS IAM token refresh#8
Merged
Conversation
Instead of a single pg_dumpall run, the all-databases backup path now: - dumps globals via pg_dumpall --globals-only → /00000-globals.sql - lists connectable databases (datallowconn = true, excludes template0) - dumps each database via pg_dump -Fc → /00001-<name>.dump, /00002-<name>.dump, … Single-database and all-databases backups now share a single code path; the single-database case filters the database list to the requested name. The exporter's restore_globals option is renamed to no_globals (inverted semantics): globals are restored by default and no_globals=true skips them. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
pg_dumpall --globals-only produces schema-only output (roles, tablespaces), so emitting it during a data-only backup is inconsistent. Suppress it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GenerateDBAuthToken is now called lazily via TokenProvider instead of at construction time, so short-lived IAM tokens (~15 min) are always fresh when they are actually used. For the importer, the token is refreshed at the start of Import() (covering Ping, emitManifest, listDatabases, canReadPgAuthid) and again before each pg_dump / pg_dumpall subprocess in emitRecord. For the exporter, it is refreshed before each pg_restore and psql invocation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Refreshing in emitManifest is consistent with the pattern used in emitRecord and Ping: the token is renewed immediately before the operation that needs it, rather than at a higher level in Import. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The old create_db flag was ambiguous and broken in several restore scenarios. Replace it with two explicit options: - clean=true: pass --clean --if-exists to pg_restore, dropping objects within the target database before recreating them. The database must already exist. - drop_and_recreate=true: pass -C --clean --if-exists, dropping the entire database and recreating it from the archive metadata. Safe on both empty and populated clusters. The default (no option) restores into an existing database without touching anything else, which is the safest behaviour for a live cluster. The two new options are mutually exclusive. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
pg_dumpall never emits CREATE DATABASE for the postgres database because it always exists and cannot be dropped. Mirror this behaviour: when drop_and_recreate=true, the postgres database is restored with --clean --if-exists instead of -C --clean --if-exists, so a full backup restores cleanly on both fresh and populated clusters without any special user action. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The old single-database layout used globals.sql; the current layout always produces 00000-globals.sql. Since backward compatibility with old snapshots is not a concern, drop the globals.sql check. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
recreate is shorter and clearer — the drop is implied. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s and Exporter.Ping - Add exclude_databases (comma-separated) to skip databases during a full backup. The AWS importer defaults it to rdsadmin, which is an internal RDS system database that regular users cannot dump. - refreshToken is now called in listDatabases and Exporter.Ping so every method that opens a connection manages its own token refresh, making future call sites safe by default. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The generic .sql fallback was a leftover from when pg_dumpall produced all.sql. The only .sql file in the current layout is 00000-globals.sql, which is already handled explicitly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When restoring a full backup, only .dump files whose database name matches the comma-separated databases list are restored. Globals (00000-globals.sql) are always restored regardless of the filter. The database name extraction is factored into a dumpBaseName helper, reused by both the filter and the pg_restore target resolution. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- awsexporter: remove the initial GenerateDBAuthToken call at construction time; TokenProvider already refreshes the token before every connection and subprocess, so the upfront call was redundant. - schemas and README: replace stale references to pg_dumpall with accurate descriptions reflecting the current per-database pg_dump layout. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
omar-polo
reviewed
May 4, 2026
omar-polo
approved these changes
May 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backup
pg_dumpallcall with per-databasepg_dump -Fcdumps. Both single-database and full-cluster backups now produce the
same layout:
/00000-globals.sql+ one numbered/000NN-<db>.dumpper database.
data_only=true(globals are schema-only).exclude_databases(comma-separated) to skip databases during afull backup. The AWS importer defaults it to
rdsadmin, an internalRDS system database that regular users cannot dump.
Restore
create_dboption with two explicit options:clean: drop objects within the target database before restoring(
--clean --if-exists). The database must already exist.recreate: drop and recreate the entire database from the archivemetadata (
-C --clean --if-exists). Thepostgresdatabase isspecial-cased and never dropped, mirroring
pg_dumpallbehaviour.exist — safe for selective restores into a live cluster.
databases(comma-separated) to restore only specific databasesfrom a full backup. Globals are always restored unless
no_globals=true.globals.sqland generic.sqlfallbacks.AWS IAM token refresh
TokenProviderinstead of once atconstruction time, in both the importer and the exporter.
Ping,emitManifest,emitRecord,listDatabases,pgRestore,psqlRestore) refreshes the token itself, so short-lived IAM tokens(~15 min) are always valid at point of use.
Test plan
clean,recreate)recreate)recreate)data_onlyandschema_onlybackupsdatabases=myapprestores only the selected database from a full backupexclude_databases=rdsadminon RDS — full backup completes without error