Re: [PATCH v2] printk: Avoid softlockups in console_unlock()

From: Andrew Morton
Date: Tue Feb 05 2013 - 15:38:44 EST


On Mon, 4 Feb 2013 23:17:10 +0100
Jan Kara <jack@xxxxxxx> wrote:

> A CPU can be caught in console_unlock() for a long time (tens of seconds are
> reported by our customers) when other CPUs are using printk heavily and serial
> console makes printing slow. Despite serial console drivers are calling
> touch_nmi_watchdog() this triggers softlockup warnings because
> interrupts are disabled for the whole time console_unlock() runs (e.g.
> vprintk() calls console_unlock() with interrupts disabled). Thus IPIs
> cannot be processed and other CPUs get stuck spinning in calls like
> smp_call_function_many(). Also RCU eventually starts reporting lockups.
>
> In my artifical testing I also managed to trigger a situation when disk
> disappeared from the system apparently because commands to / from it
> could not be delivered for long enough. This is why just silencing
> watchdogs isn't a reliable solution to the problem and we simply have to
> avoid spending too long in console_unlock().
>
> We fix the issue by limiting the time we spend in console_unlock() to
> watchdog_thresh() / 4 (unless we are in an early boot stage or oops is
> happening). The rest of the buffer will be printed either by further
> callers to printk() or by a queued work.

I still hate the patch :(

> ...
>
> +void console_unlock(void)
> +{
> + if (__console_unlock()) {
> + /* Let worker do the rest of printing */
> + schedule_work(&printk_work);
> + }
> }

This creates another place from where we cannot call printk(): anywhere
where worker_pool.lock is held.

And as schedule_work() can do a wakeup it creates a third reason why
the sched code cannot call printk (along with rq->lock taken by
wake_up(klogd) and rq->lock taken by up(&console_sem). Hence
printk_sched(). See the lkml thread "[GIT PULL] printk: Support for
full dynticks mode".

We already have machinery for doing async tickling in printk: the
printk_pending stuff. Did you consider adding another
PRINTK_PENDING_foo in some fashion?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/