Re: [Bug 199003] console stalled, cause Hard LOCKUP.
From: Sergey Senozhatsky
Date: Wed Mar 21 2018 - 22:14:50 EST
Hello Steven,
On (03/21/18 09:44), Steven Rostedt wrote:
[..]
> > #4 [ffff881ffb0b7540] delay_tsc at ffffffff81326724
> > #5 [ffff881ffb0b7548] __const_udelay at ffffffff81326678
> > #6 [ffff881ffb0b7558] wait_for_xmitr at ffffffff814056e0
> > #7 [ffff881ffb0b7580] serial8250_console_putchar at ffffffff814058ac
> > #8 [ffff881ffb0b75a0] uart_console_write at ffffffff8140035a
> > #9 [ffff881ffb0b75d0] serial8250_console_write at ffffffff814057fe
> > #10 [ffff881ffb0b7618] call_console_drivers.constprop.17 at ffffffff81087011
> > #11 [ffff881ffb0b7640] console_unlock at ffffffff810889e9
> > #12 [ffff881ffb0b7680] vprintk_emit at ffffffff81088df4
> > #13 [ffff881ffb0b76f0] dev_vprintk_emit at ffffffff81428e72
> > #14 [ffff881ffb0b77a8] dev_printk_emit at ffffffff81428eee
> > #15 [ffff881ffb0b7808] __dev_printk at ffffffff8142937e
> > #16 [ffff881ffb0b7818] dev_printk at ffffffff8142942d
> > #17 [ffff881ffb0b7888] sdev_prefix_printk at ffffffff81463771
> > #18 [ffff881ffb0b7918] scsi_prep_state_check at ffffffff814598e4
> > #19 [ffff881ffb0b7928] scsi_prep_fn at ffffffff8145992d
> > #20 [ffff881ffb0b7960] blk_peek_request at ffffffff812f0826
> > #21 [ffff881ffb0b7988] scsi_request_fn at ffffffff8145b588
> > #22 [ffff881ffb0b79f0] __blk_run_queue at ffffffff812ebd63
> > #23 [ffff881ffb0b7a08] blk_queue_bio at ffffffff812f1013 -----ãacquired q->queue_lock and wait for console_write to finish
>
> What do you mean by "wait for console_write to finish" here?
*My guess*
I suppose a typical "printk->console_unlock() under spin_lock" thing
CPU0 CPU1
spin_lock(&lock)
printk spin_lock(&lock)
console_unlock()
for (;;)
call_console_drivers()
While other CPUs printk->log_store() and some (or may be just one) is locked
up on spin_lock(&lock). So we have several lockup vectors here - one is
console_unlock() under spin_lock and the other one is CPUs spinning on the
very same spin_lock. So handoff should help here, no doubt. But it cannot help
if other CPUs will start to printk_deferred() instead of printk().
Looking at
printk()->call_console_drivers()->serial8250_console_putchar()->wait_for_xmitr()
... wait_for_xmitr() can spin for over 1 second waiting for the UART_MSR_CTS
bit.
static void wait_for_xmitr(struct uart_8250_port *up, int bits)
{
unsigned int status, tmout = 10000;
/* Wait up to 10ms for the character(s) to be sent. */
for (;;) {
status = serial_in(up, UART_LSR);
up->lsr_saved_flags |= status & LSR_SAVE_FLAGS;
if ((status & bits) == bits)
break;
if (--tmout == 0)
break;
udelay(1);
touch_nmi_watchdog();
}
/* Wait up to 1s for flow control if necessary */
if (up->port.flags & UPF_CONS_FLOW) {
for (tmout = 1000000; tmout; tmout--) {
unsigned int msr = serial_in(up, UART_MSR);
up->msr_saved_flags |= msr & MSR_SAVE_FLAGS;
if (msr & UART_MSR_CTS)
break;
udelay(1);
touch_nmi_watchdog();
}
}
...
}
a 1+ second long busy loop in the console driver is quite close to
"problems guaranteed". But, wait, there is even more. This wait_for_xmitr()
busy wait is happening after every character we print on the console. So
printk("foo") will generate 5 * wait_for_xmitr() busy loops [foo + \r + \n].
They punch&touch watchdog a lot, so at the least the system won't get killed
by the hardlockup detector. But at the same time, it's still potentially a
1+ second busy loop in the console driver * strlen(message).
Sometimes I really wish we had detached consoles. Direct printk()->console
is nice and cool, but... we can't have it. I don't want to pressure for
printk_kthread, but look at all those consoles. There are not so many ways
out. What do you think?
-ss