Re: [PATCH v4] lib/spinlock_debug.c: prevent a recursive cycle in the debug code

From: Sergey Senozhatsky
Date: Wed Jan 27 2016 - 21:36:44 EST


Hello,

On (01/28/16 10:42), Byungchul Park wrote:
> +cc peter@xxxxxxxxxxxxxxxxxx
> +cc sergey.senozhatsky.work@xxxxxxxxx

thanks for Cc-ing.

> On Wed, Jan 27, 2016 at 09:01:01PM +0900, Byungchul Park wrote:
> > changes form v3 to v4
> > - reuse a existing code as much as possible for preventing an infinite
> > recursive cycle.
> >
> > changes from v2 to v3
> > - avoid printk() only in case of lockup suspected, not real lockup in
> > which case it does not help at all.
> > - consider not only console_sem.lock but also logbuf_lock which is used
> > by printk().
> >
> > changes from v1 to v2
> > - only change comment and commit message esp. replacing "deadlock" with
> > "infinite recursive cycle", since it is not an actual deadlock.

[..]
> > It causes an infinite recursive cycle when using CONFIG_DEBUG_SPINLOCK,
> > in the spin_dump(). Backtrace prints printk() -> console_trylock() ->
> > do_raw_spin_lock() -> spin_dump() -> printk()... infinitely.
> >
> > When the debug spinlock code is called from printk(), we should prevent
> > calling spin_dump() and the like, calling printk() again in that context.


ok, I'll repeat the questions.

under what circumstances you hit this problem? ...memory corruption, cpu
core has been powered off, while it owned the spin_lock... your irqsave
didn't work?

the thing is, it's really-really hard to lockup in console_trylock()...

int down_trylock(struct semaphore *sem)
{
unsigned long flags;
int count;

raw_spin_lock_irqsave(&sem->lock, flags); <<<<<< um...
count = sem->count - 1;
if (likely(count >= 0))
sem->count = count;
raw_spin_unlock_irqrestore(&sem->lock, flags);

return (count < 0);
}

was not able to dereference sem->count? `*sem' was corrupted? or because
sem->lock was corrupted? in any of those cases you solve the problem from
the wrong side. if you have a memory corruption then it can corrupt anything,
including, for example, console driver spin_lock.


suppose, that you hit do_raw_spin_lock()->spin_dump(), which means that the
spin_lock was not corrupted, it passed debug_spin_lock_before() after all.
and that spin_lock was taken for longer than `loops_per_jiffy * HZ', while
other CPU was doing

count = sem->count - 1;
if (likely(count >= 0))
sem->count = count;

???

was the CPU that owned the lock alive? (h/w fault, perhaps?).


dunno... yes, this
printk()->console_trylock()->do_raw_spin_lock()->spin_dump()->printk()
is possible, but it's possible only when your system is screwed up badly, so
badly that this spin_dump() loop is not really a problem, IMHO.

if I'm missing something terribly obvious here, then please give more details.

-ss

> > Signed-off-by: Byungchul Park <byungchul.park@xxxxxxx>
> > ---
> > include/linux/debug_locks.h | 4 ++++
> > kernel/locking/spinlock_debug.c | 14 +++++++++++---
> > 2 files changed, 15 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/debug_locks.h b/include/linux/debug_locks.h
> > index 822c135..b0f977e 100644
> > --- a/include/linux/debug_locks.h
> > +++ b/include/linux/debug_locks.h
> > @@ -10,6 +10,10 @@ struct task_struct;
> > extern int debug_locks;
> > extern int debug_locks_silent;
> >
> > +static inline void __debug_locks_on(void)
> > +{
> > + debug_locks = 1;
> > +}
> >
> > static inline int __debug_locks_off(void)
> > {
> > diff --git a/kernel/locking/spinlock_debug.c b/kernel/locking/spinlock_debug.c
> > index 0374a59..65177ba 100644
> > --- a/kernel/locking/spinlock_debug.c
> > +++ b/kernel/locking/spinlock_debug.c
> > @@ -113,11 +113,19 @@ static void __spin_lock_debug(raw_spinlock_t *lock)
> > return;
> > __delay(1);
> > }
> > - /* lockup suspected: */
> > - spin_dump(lock, "lockup suspected");
> > +
> > + /*
> > + * We should prevent calling printk() further, since it would cause
> > + * an infinite recursive cycle if it's called from printk()!
> > + */
> > + if (__debug_locks_off()) {
> > + /* lockup suspected: */
> > + spin_dump(lock, "lockup suspected");
> > #ifdef CONFIG_SMP
> > - trigger_all_cpu_backtrace();
> > + trigger_all_cpu_backtrace();
> > #endif
> > + __debug_locks_on();
> > + }
> >
> > /*
> > * The trylock above was causing a livelock. Give the lower level arch
> > --
> > 1.9.1
>