Skip to content

fix(credstore): use gofrs/flock for advisory file locking#20

Open
appleboy wants to merge 2 commits into
mainfrom
fix/credstore-flock
Open

fix(credstore): use gofrs/flock for advisory file locking#20
appleboy wants to merge 2 commits into
mainfrom
fix/credstore-flock

Conversation

@appleboy
Copy link
Copy Markdown
Member

Summary

  • The existing filelock.go uses an O_CREATE|O_EXCL retry loop with a 30-second stale-mtime heuristic. Two problems:
    1. Wrongful eviction: a slow-but-live holder whose mtime happens to be older than 30s (long interactive flow, heavy load, slow filesystem) can have its lock yanked out from under it by a peer, letting both processes enter the critical section.
    2. TOCTOU: between os.Stat and os.Remove, the original holder may release and a new process may re-create; the Remove then deletes a lock that is actively held.
  • This PR swaps in github.com/gofrs/flock, which uses kernel advisory locking (fcntl on POSIX, LockFileEx on Windows). The kernel releases the lock automatically if the holder crashes, so no stale-timeout heuristic is needed.
  • The lock file is now intentionally left on disk after release — advisory locking protects access, not the file's existence. An orphan lock file from a crashed process is harmless and a new acquirer gets the lock immediately.

Behaviour changes

  • release() no longer deletes the lock file. Tests that observed that side-effect have been updated.
  • Max wait for contention is 30s (was effectively 30s before via stale-reap; now enforced strictly without override).

Test plan

  • make test — existing concurrent-locks test still passes
  • New TestAcquireOrphanedLockFile verifies a leftover lock file does not block new acquirers
  • TestAcquireAndRelease now asserts re-acquisition works after release (instead of file deletion)
  • make lint clean

🤖 Generated with Claude Code

- Replace the custom O_CREATE|O_EXCL retry loop and 30s stale-mtime heuristic with gofrs/flock, which uses kernel fcntl on POSIX and LockFileEx on Windows and releases the lock automatically when the holder exits
- Remove the stale-lock reaper: it had a Stat-then-Remove TOCTOU and could wrongly evict a slow but still-alive holder whose mtime was older than the timeout
- Stop deleting the lock file on release; advisory locking protects access, so an orphan file is harmless and does not block new acquirers
- Replace the stale-lock test with one that verifies an orphaned lock file does not block a new acquirer
- Drop the concurrent-writes assertion that required the lock file to be removed after saves, since the file now intentionally persists
Copilot AI review requested due to automatic review settings April 23, 2026 15:04
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates credstore’s cross-process file locking to use kernel advisory locking via github.com/gofrs/flock, replacing the previous lock-file creation + stale-mtime eviction approach to avoid wrongful eviction and TOCTOU lock removal.

Changes:

  • Replace credstore’s lock acquisition/release logic with gofrs/flock advisory locking and a fixed 30s acquisition timeout.
  • Update locking tests to reflect that .lock files are intentionally left on disk and add coverage for orphaned lock-file re-acquisition.
  • Add the new module dependency to go.mod / go.sum.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
go.mod Adds github.com/gofrs/flock dependency.
go.sum Adds checksums for github.com/gofrs/flock.
credstore/filelock.go Reimplements lock acquisition/release using advisory locks with a timeout.
credstore/filelock_test.go Updates expectations around lock-file persistence; adds orphaned lock-file test.
credstore/file_store_test.go Removes assertion that lock files are deleted after concurrent writes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread credstore/filelock.go Outdated
Comment on lines +55 to +56
func (fl *fileLock) release() error {
var closeErr error
if fl.lockFile != nil {
closeErr = fl.lockFile.Close()
}
removeErr := os.Remove(fl.lockPath)
return errors.Join(closeErr, removeErr)
return fl.fl.Unlock()
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The receiver name fl combined with the field name fl leads to fl.fl, which is unnecessarily hard to read. Rename either the receiver (e.g., l/lock) or the struct field (e.g., flock) to make the unlock call clearer.

Copilot uses AI. Check for mistakes.
Comment thread credstore/filelock.go Outdated
Comment on lines +32 to +33
// lock file from a crashed process does not block new acquirers because the
// kernel does not hand out the advisory lock to anyone.
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment is misleading: an orphaned .lock file is harmless because no process holds the advisory lock, not because the kernel "does not hand out" the lock. Reword this to clarify that the lock becomes immediately available to new acquirers once no process holds it (including after a crash).

Suggested change
// lock file from a crashed process does not block new acquirers because the
// kernel does not hand out the advisory lock to anyone.
// lock file from a crashed process does not block new acquirers because once
// no process holds the advisory lock, including after a crash, it becomes
// immediately available to be acquired again.

Copilot uses AI. Check for mistakes.
- Rename release() receiver from fl to l so the unlock call reads as l.fl.Unlock() instead of the doubled fl.fl.Unlock()
- Clarify the orphaned-lock doc comment: explain that an orphan lock file is harmless because no process holds the advisory lock (including after a crash), not because the kernel withholds it
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants