Fix taskq NULL pointer dereference on timer race #17942
Merged
+24
−7
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation and Context
This fixes a critical
NULLpointer dereference that causes kernel panics when timer-based tasks are cancelled under high concurrency. The bug manifests during frequent task cancellations, particularly with snapshot automount expiry under memory pressure or high I/O load.Description
The race condition occurs in
taskq_cancel_id()when checkingtimer_pending()before callingtimer_delete_sync(). The sequence is:timer_pending()returns FALSEtaskq_cancel_id()skipstimer_delete_sync()due to FALSE resulttask_done()frees the task, settingtqent_func = NULLand clearing flagstask_expire) finally executes on another CPUtqent_func→ kernel panicThe fix removes the unsafe conditional check and always calls
timer_delete_sync()unconditionally. This ensures the timer callback completes before the task is freed, preventing the use-after-free vulnerability.Kernel Panic:
How Has This Been Tested?
The race condition can be made reliably reproducible by applying this debug patch to widen the race window:
Reproduction script:
Results:
Types of changes
Checklist:
Signed-off-by.