Skip to content

Файлы архивов WAL *.gz.part прерывают архивацию #377

@triwada

Description

@triwada

Столкнулся с такой проблемой:
заполнился полностью раздел для бэкапов (NFS), очистил часть места, но процесс архивации WAL продолжал висеть в прежнем состоянии (на одном и том же WAL)

$ pg_probackup -V
pg_probackup 2.4.15 (Postgres Pro 9.6.18.1 enterprise)

archive_command = '/usr/bin/pg_probackup archive-push -B /pgdump/backup --instance=dummy --compress-level=1 -j 20 --batch-size=200 --wal-file-path %p --wal-file-name %f --overwrite'

В списке процессов было

postgres 123710  0.0  0.0 816616  4160 ?        Sl   Apr28   0:15 /usr/bin/pg_probackup archive-push -B /pgdump/backup --instance=dummy --compress-level=1 -j 10 --batch-size=100 --wal-file-path pg_xlog/000000A1000287D300000087 --wal-file-name 000000A1000287D300000087 --overwrite

[postgres@somehost ~]$ ps auxwww | grep archiver
postgres  38529  0.7  0.0 112848  4588 ?        Ss   Apr18 116:05 postgres: archiver process   archiving 000000A1000287D300000087

В логе pg_probackup новых записей не было, лог остановился на моменте заполнения раздела.

Прибил процесс зависшей команды из archive_command
[postgres@somehost ~]$ kill -9 123710

Запустилась новая команда pid=101930

[postgres@somehost ~]$ ps auxwww | grep pg_probackup
postgres 101930  237  0.0 1567064 6044 ?        Sl   02:40   2:03 /usr/bin/pg_probackup archive-push -B /pgdump/backup --instance=dummy --compress-level=1 -j 20 --batch-size=200 --wal-file-path pg_xlog/000000A1000287D300000087 --wal-file-name 000000A1000287D300000087 --overwrite
postgres 103193  0.0  0.0   8244   796 pts/3    S+   02:40   0:00 grep --color=auto pg_probackup

В лог pg_probackup начали добавляться новые записи



2021-04-29 02:40:00 MSK [101930]: INFO: pg_probackup archive-push WAL file: 000000A1000287D300000087, threads: 20/20, batch: 200/200, compression: zlib
2021-04-29 02:40:00 MSK [101930]: LOG: pushing file "000000A1000287D3000000E0"
2021-04-29 02:40:00 MSK [101930]: LOG: pushing file "000000A1000287D3000000DF"
2021-04-29 02:40:00 MSK [101930]: LOG: pushing file "000000A1000287D3000000DE"
2021-04-29 02:40:00 MSK [101930]: LOG: pushing file "000000A1000287D300000087"
2021-04-29 02:40:00 MSK [101930]: LOG: pushing file "000000A1000287D3000000E1"
2021-04-29 02:40:00 MSK [101930]: LOG: pushing file "000000A1000287D3000000E5"
2021-04-29 02:40:00 MSK [101930]: LOG: pushing file "000000A1000287D3000000E7"
2021-04-29 02:40:00 MSK [101930]: LOG: pushing file "000000A1000287D3000000E8"
2021-04-29 02:40:00 MSK [101930]: LOG: Temp WAL file already exists, waiting on it 24000 seconds: "/pgdump/backup/wal/dummy/000000A1000287D3000000DF.gz.part"
2021-04-29 02:40:00 MSK [101930]: LOG: Temp WAL file already exists, waiting on it 24000 seconds: "/pgdump/backup/wal/dummy/000000A1000287D3000000E0.gz.part"
2021-04-29 02:40:00 MSK [101930]: LOG: Temp WAL file already exists, waiting on it 24000 seconds: "/pgdump/backup/wal/dummy/000000A1000287D3000000DE.gz.part"
2021-04-29 02:40:00 MSK [101930]: LOG: pushing file "000000A1000287D3000000E9"
2021-04-29 02:40:00 MSK [101930]: LOG: pushing file "000000A1000287D3000000EA"
2021-04-29 02:40:00 MSK [101930]: LOG: pushing file "000000A1000287D3000000EB"
2021-04-29 02:40:00 MSK [101930]: LOG: pushing file "000000A1000287D3000000EC"
2021-04-29 02:40:00 MSK [101930]: LOG: pushing file "000000A1000287D3000000EE"
2021-04-29 02:40:00 MSK [101930]: LOG: pushing file "000000A1000287D3000000ED"
2021-04-29 02:40:00 MSK [101930]: LOG: Temp WAL file already exists, waiting on it 24000 seconds: "/pgdump/backup/wal/dummy/000000A1000287D3000000E1.gz.part"
2021-04-29 02:40:00 MSK [101930]: LOG: Temp WAL file already exists, waiting on it 24000 seconds: "/pgdump/backup/wal/dummy/000000A1000287D3000000E5.gz.part"
2021-04-29 02:40:00 MSK [101930]: LOG: Temp WAL file already exists, waiting on it 24000 seconds: "/pgdump/backup/wal/dummy/000000A1000287D3000000E7.gz.part"
2021-04-29 02:40:00 MSK [101930]: LOG: Temp WAL file already exists, waiting on it 24000 seconds: "/pgdump/backup/wal/dummy/000000A1000287D3000000E8.gz.part"
2021-04-29 02:40:00 MSK [101930]: LOG: Temp WAL file already exists, waiting on it 24000 seconds: "/pgdump/backup/wal/dummy/000000A1000287D3000000E9.gz.part"
2021-04-29 02:40:00 MSK [101930]: LOG: Temp WAL file already exists, waiting on it 24000 seconds: "/pgdump/backup/wal/dummy/000000A1000287D3000000EA.gz.part"
2021-04-29 02:40:00 MSK [101930]: LOG: pushing file "000000A1000287D3000000F0"
2021-04-29 02:40:00 MSK [101930]: LOG: pushing file "000000A1000287D3000000EF"
2021-04-29 02:40:00 MSK [101930]: LOG: pushing file "000000A1000287D3000000F1"
2021-04-29 02:40:00 MSK [101930]: LOG: pushing file "000000A1000287D3000000F2"
2021-04-29 02:40:00 MSK [101930]: LOG: pushing file "000000A1000287D3000000F3"
2021-04-29 02:40:00 MSK [101930]: LOG: pushing file "000000A1000287D3000000F4"
2021-04-29 02:40:01 MSK [101930]: LOG: WAL file already exists in archive with the same checksum, skip pushing: "/pgdata/9.6/data/pg_xlog/000000A1000287D300000087"
2021-04-29 02:40:01 MSK [101930]: LOG: pushing file "000000A1000287D3000000F5"
2021-04-29 02:40:02 MSK [101930]: LOG: pushing file "000000A1000287D3000000F6"


Но процесс архивации снова висел на том же WAL, что и до kill -9 старого процесса.


Cнова прибиваю

[postgres@somehost ~]$ kill -9 101930


В логе новые записи, но по сути - тоже самое


2021-04-29 02:40:20 MSK [101930]: LOG: pushing file "000000A1000287D4000000A5"
2021-04-29 02:40:20 MSK [101930]: LOG: pushing file "000000A1000287D4000000A6"
2021-04-29 02:40:20 MSK [101930]: LOG: pushing file "000000A1000287D4000000A7"
2021-04-29 02:40:20 MSK [101930]: LOG: pushing file "000000A1000287D4000000A8"
2021-04-29 02:49:05 MSK [113985]: INFO: pg_probackup archive-push WAL file: 000000A1000287D300000087, threads: 20/20, batch: 200/200, compression: zlib
2021-04-29 02:49:05 MSK [113985]: LOG: pushing file "000000A1000287D3000000DE"
2021-04-29 02:49:05 MSK [113985]: LOG: pushing file "000000A1000287D300000087"
2021-04-29 02:49:05 MSK [113985]: LOG: pushing file "000000A1000287D3000000DF"
2021-04-29 02:49:05 MSK [113985]: LOG: pushing file "000000A1000287D3000000E1"
2021-04-29 02:49:05 MSK [113985]: LOG: pushing file "000000A1000287D3000000E0"
2021-04-29 02:49:05 MSK [113985]: LOG: pushing file "000000A1000287D3000000E5"
2021-04-29 02:49:05 MSK [113985]: LOG: pushing file "000000A1000287D3000000E7"
2021-04-29 02:49:05 MSK [113985]: LOG: pushing file "000000A1000287D3000000E8"
2021-04-29 02:49:05 MSK [113985]: LOG: Temp WAL file already exists, waiting on it 24000 seconds: "/pgdump/backup/wal/dummy/000000A1000287D3000000DE.gz.part"
2021-04-29 02:49:05 MSK [113985]: LOG: Temp WAL file already exists, waiting on it 24000 seconds: "/pgdump/backup/wal/dummy/000000A1000287D3000000DF.gz.part"
2021-04-29 02:49:05 MSK [113985]: LOG: Temp WAL file already exists, waiting on it 24000 seconds: "/pgdump/backup/wal/dummy/000000A1000287D3000000E1.gz.part"
2021-04-29 02:49:05 MSK [113985]: LOG: Temp WAL file already exists, waiting on it 24000 seconds: "/pgdump/backup/wal/dummy/000000A1000287D3000000E0.gz.part"
2021-04-29 02:49:05 MSK [113985]: LOG: Temp WAL file already exists, waiting on it 24000 seconds: "/pgdump/backup/wal/dummy/000000A1000287D3000000E5.gz.part"
2021-04-29 02:49:05 MSK [113985]: LOG: Temp WAL file already exists, waiting on it 24000 seconds: "/pgdump/backup/wal/dummy/000000A1000287D3000000E7.gz.part"
2021-04-29 02:49:05 MSK [113985]: LOG: Temp WAL file already exists, waiting on it 24000 seconds: "/pgdump/backup/wal/dummy/000000A1000287D3000000E8.gz.part"
2021-04-29 02:49:05 MSK [113985]: LOG: pushing file "000000A1000287D3000000EA"
2021-04-29 02:49:05 MSK [113985]: LOG: pushing file "000000A1000287D3000000E9"
2021-04-29 02:49:05 MSK [113985]: LOG: pushing file "000000A1000287D4000000A9"
2021-04-29 02:49:05 MSK [113985]: LOG: pushing file "000000A1000287D4000000AA"
2021-04-29 02:49:05 MSK [113985]: LOG: pushing file "000000A1000287D4000000AB"
2021-04-29 02:49:05 MSK [113985]: LOG: pushing file "000000A1000287D4000000AC"
2021-04-29 02:49:05 MSK [113985]: LOG: pushing file "000000A1000287D4000000AD"
2021-04-29 02:49:05 MSK [113985]: LOG: pushing file "000000A1000287D4000000AE"
2021-04-29 02:49:05 MSK [113985]: LOG: pushing file "000000A1000287D4000000AF"
2021-04-29 02:49:05 MSK [113985]: LOG: pushing file "000000A1000287D4000000B0"
2021-04-29 02:49:05 MSK [113985]: LOG: pushing file "000000A1000287D4000000B1"
2021-04-29 02:49:05 MSK [113985]: LOG: pushing file "000000A1000287D4000000B2"
2021-04-29 02:49:05 MSK [113985]: LOG: Temp WAL file already exists, waiting on it 24000 seconds: "/pgdump/backup/wal/dummy/000000A1000287D3000000E9.gz.part"
2021-04-29 02:49:05 MSK [113985]: LOG: Temp WAL file already exists, waiting on it 24000 seconds: "/pgdump/backup/wal/dummy/000000A1000287D3000000EA.gz.part"
2021-04-29 02:49:06 MSK [113985]: LOG: pushing file "000000A1000287D4000000B3"
2021-04-29 02:49:06 MSK [113985]: LOG: pushing file "000000A1000287D4000000B4"


[postgres@somehost ~]$ ls -lh /pgdump/backup/wal/dummy/000000A1000287D3*.gz.part
-rw------- 1 postgres postgres 3.1M Apr 28 21:51 /pgdump/backup/wal/dummy/000000A1000287D3000000DE.gz.part
-rw------- 1 postgres postgres 3.1M Apr 28 21:51 /pgdump/backup/wal/dummy/000000A1000287D3000000DF.gz.part
-rw------- 1 postgres postgres 3.2M Apr 28 21:51 /pgdump/backup/wal/dummy/000000A1000287D3000000E0.gz.part
-rw------- 1 postgres postgres 1.2M Apr 28 21:51 /pgdump/backup/wal/dummy/000000A1000287D3000000E1.gz.part
-rw------- 1 postgres postgres    0 Apr 28 21:51 /pgdump/backup/wal/dummy/000000A1000287D3000000E5.gz.part
-rw------- 1 postgres postgres    0 Apr 28 21:51 /pgdump/backup/wal/dummy/000000A1000287D3000000E7.gz.part
-rw------- 1 postgres postgres    0 Apr 28 21:51 /pgdump/backup/wal/dummy/000000A1000287D3000000E8.gz.part
-rw------- 1 postgres postgres    0 Apr 28 21:51 /pgdump/backup/wal/dummy/000000A1000287D3000000E9.gz.part
-rw------- 1 postgres postgres    0 Apr 28 21:51 /pgdump/backup/wal/dummy/000000A1000287D3000000EA.gz.part

Удалил эти файлы .gz.part
[postgres@somehost ~]$ rm -f /pgdump/backup/wal/dummy/000000A1000287D3*.gz.part

После этого архивация заработала.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions