fix(sync): deadlock on incremental sync with > 10 modified files#132
Open
sunnnybala wants to merge 1 commit intogarrytan:masterfrom
Open
fix(sync): deadlock on incremental sync with > 10 modified files#132sunnnybala wants to merge 1 commit intogarrytan:masterfrom
sunnnybala wants to merge 1 commit intogarrytan:masterfrom
Conversation
sync.ts wraps the add/modify loop in engine.transaction(), and each importFromContent inside opens another one. PGLite's _runExclusiveTransaction is a non-reentrant mutex — the second call queues on the mutex the first is holding, and the process hangs forever in ep_poll. Reproduced with a 15-file commit: unpatched hangs, patched runs in 3.4s. Fix drops the outer wrap; per-file atomicity is correct anyway (one file's failure should not roll back the others).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Symptom:
gbrain sync(incremental) hangs indefinitely when the diff touches more than 10 syncable files.--fullalways works. Reproduced with a 15-file commit: unpatched sync hangs forever, patched sync completes in 3.4 seconds on the same diff.Cause:
src/commands/sync.tswraps the add/modify loop inengine.transaction()whenuseTransaction > 10. Each file in the loop then callsimportFromContent, which opens anotherengine.transaction()on the same PGLite instance. PGLite transactions are not reentrant — the inner call queues on the same_runExclusiveTransactionmutex the outer is holding, producing a classic recursive-mutex deadlock. Main thread parks inep_poll, workers park infutex_wait_queue, zero CPU advancement.Fix: drop the outer transaction wrap. Each file's inner transaction is already atomic; per-file atomicity is also the right granularity (one file's failure should not roll back the others' successful imports).
Diff: 21 insertions, 21 deletions, no behavior change below the ≤10 threshold.
Details
The deadlock chain
useTransaction > 10branch atsync.ts:225:engine.transactioninpglite-engine.tsdelegates to PGLite:async () => { await processAddsModifies(); }— it does not taketxEngineas a parameter.processAddsModifiesis defined in the outer scope ofperformSyncand closes over the originalenginevariable, not thetxEnginepassed into the callback. So every innerimportFile(engine, …)call uses the originalPGLiteEngine, whosethis.dbis the rawPGliteinstance (not the outertxobject).importFromContent(import-file.ts:95):this.db.transaction(...)on the originalPGliteinstance — the same instance whose exclusive-transaction mutex is currently held by the outer transaction.transaction()(from@electric-sql/pglitechunk-HDIMFN25.js):_runExclusiveTransactionis a mutex. The inner call queues on the same mutex the outer is still holding. The outer can't release — it's awaiting the user callback, which is awaiting the inner transaction, which is queued forever.Reproduction
Evidence captured live:
/proc/<pid>/task/*/wchan→ mainep_poll, two worker threadsfutex_wait_queueutime 205 stime 47unchanged across three 2-second samplesreal 0m3.4swith statussynced, all pages affected, chunks createdWhy this has flown under the radar
git diffcontains more than 10 files that passisSyncable()(excludesREADME.md,index.md,schema.md,log.md, hidden dirs, non-.md/.mdx)elsebranch, bypassing the wrap entirelygbrain import(bulk ingest) and first-sync go throughperformFullSync→runImport, which has no outer wrapsync --fullworks as an apparent workaround, leading users to blame "PGLite is slow" rather than report a bugWhere it bites in production
[Source: ...]citationsgit commit+gbrain syncWhy removing the outer wrap is safe
importFromContentis already atomic per file (its inner transaction commits thepagesrow,tags,chunks, andpage_versionstogether)performFullSyncpath has always done per-file inner transactions with no outer wrap — this fix simply makes the incremental path matchThe diff