Re: [PATCH v5] lib/spinlock_debug.c: prevent a recursive cycle in the debug code

From: Byungchul Park
Date: Fri Jan 29 2016 - 07:55:25 EST


On Fri, Jan 29, 2016 at 09:43:37PM +0900, Byungchul Park wrote:
> changes from v4 to v5
> - found out a clear scenario which make a system crazy. at least it
> should not be caused by the debug code.

Hello, Andrew

Please take this v5 patch instead of v2 patch, which you took. Or give your
opinion.

Thanks,
Byungchul

>
> changes from v3 to v4
> - reuse a existing code as much as possible for preventing an infinite
> recursive cycle.
>
> changes from v2 to v3
> - avoid printk() only in case of lockup suspected, not real lockup in
> which case it does not help at all.
> - consider not only console_sem.lock but also logbuf_lock which is used
> by printk().
>
> changes from v1 to v2
> - only change comment and commit message esp. replacing "deadlock" with
> "infinite recursive cycle", since it is not an actual deadlock.
>
> thanks,
> byungchul
>
> -----8<-----
> >From eed077240e0b0d9f14d91037ef1915feab85aa4d Mon Sep 17 00:00:00 2001
> From: Byungchul Park <byungchul.park@xxxxxxx>
> Date: Fri, 29 Jan 2016 21:23:24 +0900
> Subject: [PATCH v5] lib/spinlock_debug.c: prevent a recursive cycle in the
> debug code
>
> It causes an infinite recursive cycle when using CONFIG_DEBUG_SPINLOCK,
> in the spin_dump(). Backtrace prints printk() -> console_trylock() ->
> do_raw_spin_lock() -> spin_dump() -> printk()... infinitely.
>
> When the spin_dump() is called from printk(), we should prevent the
> debug spinlock code from calling printk() again in that context. It's
> reasonable to avoid printing "lockup suspected" which is just a warning
> message but it would cause a real lockup definitely.
>
> The scenario is,
>
> cpu0
> ====
> printk
> console_trylock
> console_unlock
> up_console_sem
> up
> raw_spin_lock_irqsave(&sem->lock, flags)
> __up
> wake_up_process
> try_to_wake_up
> raw_spin_lock_irqsave(&p->pi_lock)
> __spin_lock_debug
> spin_dump <=== the problem point!
> printk
> console_trylock
> raw_spin_lock_irqsave(&sem->lock, flags)
>
> <=== DEADLOCK
>
> cpu1
> ====
> printk
> console_trylock
> raw_spin_lock_irqsave(&sem->lock, flags)
> __spin_lock_debug
> spin_dump
> printk
> ...
>
> <=== repeat the recursive cycle infinitely
>
> Signed-off-by: Byungchul Park <byungchul.park@xxxxxxx>
> ---
> kernel/locking/spinlock_debug.c | 16 +++++++++++++---
> kernel/printk/printk.c | 5 +++++
> 2 files changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/locking/spinlock_debug.c b/kernel/locking/spinlock_debug.c
> index 0374a59..cf7bc96 100644
> --- a/kernel/locking/spinlock_debug.c
> +++ b/kernel/locking/spinlock_debug.c
> @@ -103,6 +103,8 @@ static inline void debug_spin_unlock(raw_spinlock_t *lock)
> lock->owner_cpu = -1;
> }
>
> +extern int is_console_lock(raw_spinlock_t *lock);
> +
> static void __spin_lock_debug(raw_spinlock_t *lock)
> {
> u64 i;
> @@ -113,11 +115,19 @@ static void __spin_lock_debug(raw_spinlock_t *lock)
> return;
> __delay(1);
> }
> - /* lockup suspected: */
> - spin_dump(lock, "lockup suspected");
> +
> + /*
> + * If this function is called from printk(), then we should
> + * not call printk() more. Or it will cause an infinite
> + * recursive cycle!
> + */
> + if (likely(!is_console_lock(lock))) {
> + /* lockup suspected: */
> + spin_dump(lock, "lockup suspected");
> #ifdef CONFIG_SMP
> - trigger_all_cpu_backtrace();
> + trigger_all_cpu_backtrace();
> #endif
> + }
>
> /*
> * The trylock above was causing a livelock. Give the lower level arch
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 2ce8826..568ab11 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1981,6 +1981,11 @@ asmlinkage __visible void early_printk(const char *fmt, ...)
> }
> #endif
>
> +int is_console_lock(raw_spinlock_t *lock)
> +{
> + return lock == &console_sem.lock;
> +}
> +
> static int __add_preferred_console(char *name, int idx, char *options,
> char *brl_options)
> {
> --
> 1.9.1