[PATCH] sched: flush plug in schedule_preempt_disabled() to prevent deadlock

From: Ming Lei

Date: Tue May 12 2026 - 05:22:29 EST


On preemptible kernels, a deadlock can occur when a task with plugged IO
calls schedule_preempt_disabled():

schedule_preempt_disabled()
sched_preempt_enable_no_resched() // preemption now enabled
schedule() // <-- preemption can happen here
sched_submit_work()
blk_flush_plug()

After sched_preempt_enable_no_resched() re-enables preemption, the task
can be preempted (e.g., by a higher-priority RT task) before reaching
blk_flush_plug() in sched_submit_work(). Since the task's state is
already TASK_UNINTERRUPTIBLE (set by the mutex/rwsem slowpath caller),
requests in current->plug remain unflushed for an unbounded time.

If another task depends on those plugged requests to make progress (e.g.,
to release a lock the sleeping task needs), a deadlock results:

- Task A (writeback worker): holds plugged IO, preempted before
flushing, stuck on run queue behind higher-priority work
- Task B: waiting for IO completion from Task A's plug, holds a lock
that Task A needs to be woken up

Both reported deadlocks involve mutex/rwsem slowpaths, which are the
primary callers of schedule_preempt_disabled() with non-running task
state.

Fix by flushing the plug in schedule_preempt_disabled() while
preemption is still disabled. This ensures the plug is empty before the
preemption window opens.

Fixes: 73c101011926 ("block: initial patch for on-stack per-task plugging")
Reported-by: Michael Wu <michael@xxxxxxxxxxxxxxxxx>
Tested-by: Michael Wu <michael@xxxxxxxxxxxxxxxxx>
Reported-by: Xiaosen He <xiaosen.he@xxxxxxxxxxxxxxxx>
Link: https://lore.kernel.org/linux-block/20260417082744.30124-1-michael@xxxxxxxxxxxxxxxxx/
Signed-off-by: Ming Lei <tom.leiming@xxxxxxxxx>
---
kernel/sched/core.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b8871449d3c6..c1efe110c54d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7336,6 +7336,8 @@ asmlinkage __visible void __sched schedule_user(void)
*/
void __sched schedule_preempt_disabled(void)
{
+ if (!task_is_running(current))
+ blk_flush_plug(current->plug, true);
sched_preempt_enable_no_resched();
schedule();
preempt_disable();
--
2.53.0