Re: [Query] Preemption (hogging) of the work handler

From: Jan Kara
Date: Thu Jul 14 2016 - 10:12:27 EST


On Wed 13-07-16 14:45:07, Sergey Senozhatsky wrote:
> Cc Petr Mladek.
>
> On (07/12/16 16:19), Viresh Kumar wrote:
> [..]
> > Okay, we have tracked this BUG and its really interesting.
>
> good find!
>
> > I hacked the platform's serial driver to implement a putchar() routine
> > that simply writes to the FIFO in polling mode, that helped us in
> > tracing on where we are going wrong.
> >
> > The problem is that we are running asynchronous printks and we call
> > wake_up_process() from the last running CPU which has disabled
> > interrupts. That takes us to: try_to_wake_up().
> >
> > In our case the CPU gets deadlocked on this line in try_to_wake_up().
> >
> > raw_spin_lock_irqsave(&p->pi_lock, flags);
>
> yeah, printk() can't handle these types of recursion. it can prevent
> printk() calls issued from within the logbuf_lock spinlock section,
> with some limitations:
>
> if (unlikely(logbuf_cpu == smp_processor_id())) {
> recursion_bug = true;
> return;
> }
>
> raw_spin_lock(&logbuf_lock);
> logbuf_cpu = this_cpu;
> ...
> logbuf_cpu = UINT_MAX;
> raw_spin_unlock(&logbuf_lock);
>
> so should, for instance, raw_spin_unlock() generate spin_dump(), printk()
> will blow up (both sync and async), because logbuf_cpu is already reset.
> it may look that async printk added another source of recursion - wake_up().
> but, apparently, this is not exactly correct. because there is already a
> wake_up() call in console_unlock() - up().
>
> printk()
> if (logbuf_cpu == smp_processor_id())
> return;
>
> raw_spin_lock(&logbuf_lock);
> logbuf_cpu = this_cpu;
> ...
> logbuf_cpu = UINT_MAX;
> raw_spin_unlock(&logbuf_lock);
>
> console_trylock()
> raw_spin_lock_irqsave(&sem->lock) << ***
> raw_spin_unlock_irqsave(&sem->lock) << ***
>
> console_unlock()
> up()
> raw_spin_lock_irqsave(&sem->lock) << ***
> __up()
> wake_up_process()
> try_to_wake_up() << *** in may places
>
>
> *** a printk() call from here will kill the system. either it will
> recurse printk(), or spin forever in 'nested' printk() on one of
> the already taken spin locks.

Exactly. Calling printk() from certain parts of the kernel (like scheduler
code or timer code) has been always unsafe because printk itself uses these
parts and so it can lead to deadlocks. That's why printk_deffered() has
been introduced as you mention below.

And with sync printk the above deadlock doesn't trigger only by chance - if
there happened to be a waiter on console_sem while we suspend, the same
deadlock would trigger because up(&console_sem) will try to wake him up and
the warning in timekeeping code will cause recursive printk.

So I think your patch doesn't really address the real issue - it only
works around the particular WARN_ON(timekeeping_enabled) warning but if
there was a different warning in timekeeping code which would trigger, it
has a potential for causing recursive printk deadlock (and indeed we had
such issues previously - see e.g. 504d58745c9c "timer: Fix lock inversion
between hrtimer_bases.lock and scheduler locks").

So there are IMHO two issues here worth looking at:

1) I didn't find how a wakeup would would lead to calling to ktime_get() in
the current upstream kernel or even current RT kernel. Maybe this is a
problem specific to the 3.10 kernel you are using? If yes, we don't have to
do anything for current upstream AFAIU.

If I just missed how wakeup can call into ktime_get() in current upstream,
there is another question:

2) Is it OK that printk calls wakeup so late during suspend? I believe it
is but I'm neither scheduler nor suspend expert. If it is OK, and wakeup
can lead to ktime_get() in current upstream, then this contradicts the
check WARN_ON(timekeeping_suspended) in ktime_get() and something is wrong.

Adding Thomas to CC as timer / RT expert...

Honza

> so... I think we can switch to sync printk mode in suspend_console() and
> enable async printk from resume_console(). IOW, suspend/kexec are now
> executed under sync printk mode.
>
> we already call console_unlock() during suspend, which is synchronous,
> many times (e.g. console_cpu_notify()).
>
>
> something like below, perhaps. will this work for you?
>
> ---
> kernel/printk/printk.c | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index bbb4180..786690e 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -288,6 +288,11 @@ static u32 log_buf_len = __LOG_BUF_LEN;
>
> /* Control whether printing to console must be synchronous. */
> static bool __read_mostly printk_sync = true;
> +/*
> + * Force sync printk mode during suspend/kexec, regardless whether
> + * console_suspend_enabled permits console suspend.
> + */
> +static bool __read_mostly force_printk_sync;
> /* Printing kthread for async printk */
> static struct task_struct *printk_kthread;
> /* When `true' printing thread has messages to print */
> @@ -295,7 +300,7 @@ static bool printk_kthread_need_flush_console;
>
> static inline bool can_printk_async(void)
> {
> - return !printk_sync && printk_kthread;
> + return !printk_sync && printk_kthread && !force_printk_sync;
> }
>
> /* Return log buffer address */
> @@ -2027,6 +2032,7 @@ static bool suppress_message_printing(int level) { return false; }
>
> /* Still needs to be defined for users */
> DEFINE_PER_CPU(printk_func_t, printk_func);
> +static bool __read_mostly force_printk_sync;
>
> #endif /* CONFIG_PRINTK */
>
> @@ -2163,6 +2169,8 @@ MODULE_PARM_DESC(console_suspend, "suspend console during suspend"
> */
> void suspend_console(void)
> {
> + force_printk_sync = true;
> +
> if (!console_suspend_enabled)
> return;
> printk("Suspending console(s) (use no_console_suspend to debug)\n");
> @@ -2173,6 +2181,8 @@ void suspend_console(void)
>
> void resume_console(void)
> {
> + force_printk_sync = false;
> +
> if (!console_suspend_enabled)
> return;
> down_console_sem();
> --
> 2.9.0.rc1
>
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR