Re: [PATCH v3 2/3] workqueue: defer the worker wakeup outside pool->lock in __queue_work()

From: Breno Leitao

Date: Wed Jun 24 2026 - 07:19:51 EST


Hello Sebastian,

On Wed, Jun 24, 2026 at 10:47:56AM +0200, Sebastian Andrzej Siewior wrote:
> On 2026-06-16 06:33:32 [-0700], Breno Leitao wrote:
> > __queue_work() is the enqueue hot path: it inserts the work item and
> > calls kick_pool() while holding pool->lock. kick_pool() ends in a
> > wakeup, which takes the target task's rq->lock, so rq->lock nests under
> > pool->lock on every enqueue that wakes a worker on a contended unbound
> > pool.
> >
> > Use kick_pool_pick() to select and claim the worker under pool->lock,
> > queue it on an on-stack wake_q, and issue the wakeup with wake_up_q()
> > right after dropping the lock via raw_spin_unlock_wake(). Worker
> > selection, wake_cpu setup and claiming the worker off pool->idle_list
> > still happen under the lock; only the rq->lock acquisition moves out.
> >
> > Signed-off-by: Breno Leitao <leitao@xxxxxxxxxx>
>
> > --- a/kernel/workqueue.c
> > +++ b/kernel/workqueue.c
>
> > out:
> > - raw_spin_unlock(&pool->lock);
> > + raw_spin_unlock_wake(&pool->lock, &wakeq);
> > rcu_read_unlock();
> > }
>
> It is not wrong but I am not sure if this is really needed here. The
> pattern
> preempt_disable();
> raw_spin_unlock();
> wake_up_q()
> preempt_enable();
>
> is used to prevent task preemption after the unlock operation. The
> futex/ locking code needs to wake a task but before the unlock operation
> the task priority might have been lowered as result of dropping the
> lock. This means it might not be the task with the highest priority in
> the system and the task with highest is not yet active and we schedule a
> task in middle in the instead.
> To form this easier: say we have Task A prio=1, B prio=2 and C prio=3
> with higher number higher priority.
> A owns a lock and is preempted by B. C gets on the CPU preempts B. C
> wants A's lock so it passes its priority to A (PI-boost) and goes idle.
> A gets on the CPU and unlocks.
> Now: As part of the unlock operation (before the raw_spin_unlock()) A
> goes back to its initial priority and C is not yet woken up meaning B is
> the task with highest priority and the held raw_spinlock_t is the only
> thing preventing scheduling. So we disable preemption, unlock and then
> perform the wake-up so C becomes the next candidate (as it should be) to
> occupy the CPU.
> Otherwise it would be B which means per definition a task with lower
> priority runs before a task with higher priority.
>
> I don't think workqueue has this requirements here. Worst case something
> else gets on the CPU and worker wakeup is delayed until the task is
> scheduled again. It could be fine since the preemption could happen
> before queue-work().
> It probably does not lead to a huge performance
> regression but who knows.
>
> Your goal was to lower the contention on the pool lock during the wake
> up which you achieved. The __queue_work() remains still not preemptible
> until after the wake up which might be fine.

You're absolutely right — the preemption dance isn't needed here.

I used raw_spin_unlock_wake() because the helper seemed to match what I
was doing, but this preempt_disable/enable pattern (while cheap) is
unnecessary for this use case.

I'll update the series to use the standard approach:

raw_spin_unlock_irq(&pool->lock);
wake_up_q(&wakeq);

Thanks,
--breno