Re: [PATCH] RFC: console: hack up console_lock more v3

From: Petr Mladek
Date: Thu May 09 2019 - 10:57:35 EST


On Thu 2019-05-09 14:09:03, Daniel Vetter wrote:
> console_trylock, called from within printk, can be called from pretty
> much anywhere. Including try_to_wake_up. Note that this isn't common,
> usually the box is in pretty bad shape at that point already. But it
> really doesn't help when then lockdep jumps in and spams the logs,
> potentially obscuring the real backtrace we're really interested in.
> One case I've seen (slightly simplified backtrace):
>
> Call Trace:
> <IRQ>
> console_trylock+0xe/0x60
> vprintk_emit+0xf1/0x320
> printk+0x4d/0x69
> __warn_printk+0x46/0x90
> native_smp_send_reschedule+0x2f/0x40
> check_preempt_curr+0x81/0xa0
> ttwu_do_wakeup+0x14/0x220
> try_to_wake_up+0x218/0x5f0
> pollwake+0x6f/0x90
> credit_entropy_bits+0x204/0x310
> add_interrupt_randomness+0x18f/0x210
> handle_irq+0x67/0x160
> do_IRQ+0x5e/0x130
> common_interrupt+0xf/0xf
> </IRQ>
>
> This alone isn't a problem, but the spinlock in the semaphore is also
> still held while waking up waiters (up() -> __up() -> try_to_wake_up()
> callchain), which then closes the runqueue vs. semaphore.lock loop,
> and upsets lockdep, which issues a circular locking splat to dmesg.
> Worse it upsets developers, since we don't want to spam dmesg with
> clutter when the machine is dying already.
>
> Fix this by creating a prinkt_safe_up() which calls wake_up_process
> outside of the spinlock. This isn't correct in full generality, but
> good enough for console_lock:
>
> - console_lock doesn't use interruptible or killable or timeout down()
> calls, hence an up() is the only thing that can wake up a process.
> Hence the process can't get woken and killed and reaped while we try
> to wake it up too.
>
> - semaphore.c always updates the waiter list while under the spinlock,
> so there's no other races. Specifically another process that races
> with a quick console_lock/unlock while we've dropped the spinlock
> already won't see our own waiter.
>
> Note that we only have to break the recursion for the semaphore.lock
> spinlock of the console_lock. Recursion within various scheduler
> related locks is already prevented by the printk_safe_enter/exit pair
> in __up_console_sem().

This is not fully true. printk_safe() helps only when
the first try_to_wake_up() is called from printk_safe() context.

> --- a/kernel/locking/semaphore.c
> +++ b/kernel/locking/semaphore.c
> @@ -197,6 +197,37 @@ struct semaphore_waiter {
> bool up;
> };
>
> +/**
> + * printk_safe_up - release the semaphore in console_unlock
> + * @sem: the semaphore to release
> + *
> + * Release the semaphore. Unlike mutexes, up() may be called from any
> + * context and even by tasks which have never called down().
> + *
> + * NOTE: This is a special version of up() for console_unlock only. It is only
> + * safe if there are no killable, interruptible or timing out down() calls.
> + */
> +void printk_safe_up(struct semaphore *sem)
> +{
> + unsigned long flags;
> + struct semaphore_waiter *waiter = NULL;
> +
> + raw_spin_lock_irqsave(&sem->lock, flags);
> + if (likely(list_empty(&sem->wait_list))) {
> + sem->count++;
> + } else {
> + waiter = list_first_entry(&sem->wait_list,
> + struct semaphore_waiter, list);
> + list_del(&waiter->list);
> + waiter->up = true;
> + }
> + raw_spin_unlock_irqrestore(&sem->lock, flags);
> +
> + if (waiter) /* protected by being sole wake source */
> + wake_up_process(waiter->task);

I still do not see how this could help. Let's take the above
backtrace as example:

<IRQ>
console_trylock+0xe/0x60
vprintk_emit+0xf1/0x320
printk+0x4d/0x69
__warn_printk+0x46/0x90
native_smp_send_reschedule +0x2f/0x40
check_preempt_curr+0x81/0xa0
ttwu_do_wakeup+0x14/0x220
try_to_wake_up+0x218/0x5f0
pollwake+0x6f/0x90
credit_entropy_bits+0x204/0x310
add_interrupt_randomness+0x18f/0x210
handle_irq+0x67/0x160
do_IRQ+0x5e/0x130
common_interrupt+0xf/0xf
</IRQ>

We have the following chain of calls:

+ do_IRQ()
...
+ try_to_wake_up() # takes p->pi_lock
+ ttwu_remote() # takes rq lock
+ ttwu_do_wakeup()
+ check_preempt_curr()
+ native_smp_send_reschedule()
+ __warn_printk()
+ printk()
+ vprintk_emit()
+ console_trylock() # success
+ console_unlock()
+ up_console_sem()
+ up() # wait list in not empty
+ __up()
+ wake_up_process()
+ try_to_wake_up()

!BANG! Deadlock on p->pi_lock.

It does not matter if the nested try_to_wake_up() was called
under sem->lock or outside.

By other words. The patch removed one lockdep warning. But it just
just delayed the deadlock. It will not happen on sem->lock but
later on p->pi_lock.

I am repeating myself. But IMHO, the only solution is to introduce
printk deferred context and use it in WARN_DEFERRED().

Best Regards,
Petr