Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to open file "xxxx/00000001000000870000001D" for wal backup #11092

Open
Lzjing-1997 opened this issue Mar 5, 2025 · 2 comments
Open
Labels
external A PR or Issue is created by an external user t/bug Issue Type: Bug

Comments

@Lzjing-1997
Copy link

Steps to reproduce

(1) Increase the WAL_SEGMENT_SIZE to 64MB; // Larger WAL segment is more likely to trigger this problem
(2) Enable safekeeper WAL backup;
(3) psql
(4) => select pg_switch_wal(); // It may need to be executed multiple times

Expected result

safekeeper wal backup successfully

Actual result

The following error was found in the safekeeper log:

2025-03-05T03:30:27.232566Z ERROR wal_backup{ttid=fee6933a27ec11f88ff516873b696d68/bea52794b362ed3f62d0c794a011765f}: failed while offloading range 0/12000000-0/13000000, backup_lsn 0/12000000: Failed to open file "/Users/jinfeng/Desktop/workplace/neondatabase/neon_2/neon/.neon/safekeepers/sk1/fee6933a27ec11f88ff516873b696d68/bea52794b362ed3f62d0c794a011765f/000000010000000000000012" for wal backup

Caused by:
No such file or directory (os error 2)

(Eventually, the backup will be successful)

Environment

macbook;
Latest Neon code.

Logs, links

Image

Analysis

(1) Execute "select pg_switch_wal();",will produce "SWITCH" WAL Record;
(2) When safekeeper received the "SWITCH" WAL Record,the 'write_record_lsn' will be set to the start lsn of the next WAL segment; but safekeeper will continue to receive '\0' data until the current segment is filled;
(3) 'flush_ticker’ periodically triggers the flush operation,which updates the 'flush_record_lsn'/'commit_lsn',and notify the 'WAL backup' process;
(4) If the current segment is not filled, 'WAL backup' process will not find the WAL segment.

@Lzjing-1997 Lzjing-1997 added the t/bug Issue Type: Bug label Mar 5, 2025
@github-actions github-actions bot added the external A PR or Issue is created by an external user label Mar 5, 2025
@arssher
Copy link
Contributor

arssher commented Mar 6, 2025

Yeah, I think not making flush_record_lsn update when self.write_lsn < self.write_record_lsn in flush_wal would be a fix.

@Lzjing-1997
Copy link
Author

Yeah, I think not making flush_record_lsn update when self.write_lsn < self.write_record_lsn in flush_wal would be a fix.

Yes, I think it's a solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external A PR or Issue is created by an external user t/bug Issue Type: Bug
Projects
None yet
Development

No branches or pull requests

2 participants