Re: [PATCH v2] locking/semaphore: Use wake_q to wake up processes outside lock critical section

From: Waiman Long
Date: Mon Sep 19 2022 - 15:49:30 EST


On 9/9/22 15:28, Waiman Long wrote:
It was found that a circular lock dependency can happen with the
following locking sequence:

+--> (console_sem).lock --> &p->pi_lock --> &rq->__lock --+
| |
+---------------------------------------------------------+

The &p->pi_lock --> &rq->__lock sequence is very common in all the
task_rq_lock() calls.

The &rq->__lock --> (console_sem).lock sequence happens when the
scheduler code calling printk() or more likely the various WARN*()
macros while holding the rq lock. The (console_sem).lock is actually
a raw spinlock guarding the semaphore. In the particular lockdep splat
that I saw, it was caused by SCHED_WARN_ON() call in update_rq_clock().
To work around this locking sequence, we may have to ban all WARN*()
calls when the rq lock is held, which may be too restrictive, or we
may have to add a WARN_DEFERRED() call and modify all the call sites
to use it.

Even then, a deferred printk or WARN function may still call
console_trylock() which may, in turn, calls up_console_sem() leading
to this locking sequence.

The other ((console_sem).lock --> &p->pi_lock) locking sequence
was caused by the fact that the semaphore up() function is calling
wake_up_process() while holding the semaphore raw spinlock. This lockiing
sequence can be easily eliminated by moving the wake_up_processs()
call out of the raw spinlock critical section using wake_q which is
what this patch implements. That is the easiest and the most certain
way to break this circular locking sequence.

v1: https://lore.kernel.org/lkml/20220118153254.358748-1-longman@xxxxxxxxxx/

Signed-off-by: Waiman Long <longman@xxxxxxxxxx>

Ping!

Note that the current printk_deferred() code path may also hit this problem as an up() call of console_sem may be issued.

Cheers,
Longman