Re: [PATCH] sched: flush plug in schedule_preempt_disabled() to prevent deadlock
From: Peter Zijlstra
Date: Tue May 12 2026 - 08:05:07 EST
On Tue, May 12, 2026 at 04:59:39PM +0800, Ming Lei wrote:
> On preemptible kernels, a deadlock can occur when a task with plugged IO
> calls schedule_preempt_disabled():
>
> schedule_preempt_disabled()
> sched_preempt_enable_no_resched() // preemption now enabled
> schedule() // <-- preemption can happen here
> sched_submit_work()
> blk_flush_plug()
>
> After sched_preempt_enable_no_resched() re-enables preemption, the task
> can be preempted (e.g., by a higher-priority RT task) before reaching
> blk_flush_plug() in sched_submit_work(). Since the task's state is
> already TASK_UNINTERRUPTIBLE (set by the mutex/rwsem slowpath caller),
> requests in current->plug remain unflushed for an unbounded time.
>
> If another task depends on those plugged requests to make progress (e.g.,
> to release a lock the sleeping task needs), a deadlock results:
>
> - Task A (writeback worker): holds plugged IO, preempted before
> flushing, stuck on run queue behind higher-priority work
> - Task B: waiting for IO completion from Task A's plug, holds a lock
> that Task A needs to be woken up
>
> Both reported deadlocks involve mutex/rwsem slowpaths, which are the
> primary callers of schedule_preempt_disabled() with non-running task
> state.
>
> Fix by flushing the plug in schedule_preempt_disabled() while
> preemption is still disabled. This ensures the plug is empty before the
> preemption window opens.
How is this different from any path calling schedule()? That would be
subject to exactly the same issue.
The patch cannot be correct.