Re: [PATCH v4] lib/spinlock_debug.c: prevent a recursive cycle in the debug code

From: Peter Hurley
Date: Thu Jan 28 2016 - 18:08:29 EST


On 01/28/2016 07:42 AM, Sergey Senozhatsky wrote:
> On (01/28/16 19:53), Sergey Senozhatsky wrote:
>>> ah... silly me... you mean the first CPU that triggers the spin_dump() will
>> ^^^ this, of course, is true for
>> console_sem->lock and logbuf_lock
>> only.
>>
>>> deadlock itself, so the rest of CPUs will see endless recursive
>>> spin_lock()->spin_dump()->spin_lock()->spin_dump() calls?
> [..]
>>> Can you please update your bug description in the commit message?
>>> It's the deadlock that is causing the recursion on other CPUs in the
>>> first place.
>
> no, don't update anything. I was completely wrong. it's not a deadlock
> that is the root cause here.
>
> even if at some level of recursion (nested printk calls)
> spin_dump()->__spin_lock_debug()->arch_spin_trylock() acquires the
> lock, it returns back with the spin lock unlocked anyway.
>
> vprintk_emit()
> console_trylock()
> spin_lock()
> spin_dump()
> vprintk_emit()
> console_trylock()
> spin_lock()
> spin_dump()
> vprintk_emit()
> console_trylock()
> spin_lock() << OK, got the lock finally

The problem is you have postulated a very shallow recursion.
This looks much worse if this happens 1000 times, and
probably won't recover to output anything.

Additionally, what if the console_sem is simply corrupted?
A livelock with no output ever is not very helpful.

As I wrote earlier, I don't think this is the way to fix
recursion problems with printk() [by eliding output].

Rather, a way to effectively determine a recursion is in progress,
and _at a minimum_ guaranteeing that the recursive output will
eventually be output should be the goal.

Including dumb recursion like a console driver printing
an error :/

Then, lockdep could remain enabled while calling console drivers.

Regards,
Peter Hurley

> sem->count--
> spin_unlock() << unlock, return
> arch_spin_lock() << got the lock, return
> sem->count--
> spin_unlock() << unlock, return
> arch_spin_lock() << got the lock, return
> sem->count--
> spin_unlock() << unlock, return
>
>
> ...um
>
>
>> But I found there's a possiblity in the debug code *itself* to cause a
>> lockup.
>
> please explain.
>
> -ss
>