Re: Nonterministic hang during bootconsole/console handover on ath79

From: Matthias Schiffer
Date: Mon Mar 21 2016 - 20:52:50 EST


On 03/22/2016 12:08 AM, Greg KH wrote:
> On Tue, Mar 22, 2016 at 12:02:57AM +0100, Matthias Schiffer wrote:
>> Hi,
>> we're experiencing weird nondeterministic hangs during bootconsole/console
>> handover on some ath79 systems on OpenWrt. I've seen this issue myself on
>> kernel 3.18.23~3.18.27 on a AR7241-based system, but according to other
>> reports ([1], [2]) kernel 4.1.x is affected as well, and other SoCs like
>> QCA953x likewise.
>
> Can you try 4.4 or ideally, 4.5? There's been a lot of console/tty
> fixes/changes since the obsolete 3.18 kernel you are using...
>
> thanks,
>
> greg k-h
>

With 4.4, I was not able to reproduce this hang, but I have no idea if this
is caused by an actual bugfix, or just random timing changes hiding the
bug. I suspect the latter might be the case (as I wrote in my first mail,
even minor differences in kernel images of the same version and the same
config make the hang more or less probable.) I was not yet able to test
4.5, as OpenWrt is a hell of kernel patches...

On 3.18, I also tried other things like disabling the early console
altogether, which also made the hang go away, but as even much smaller
changes hid the bug, this doesn't really say much.

The basic code path during the console handover seems to be the same in
3.18 and 4.4, even though a few functions have been moved; the relevant
part of the log looks the same:

> [ 0.756298] Serial: 8250/16550 driver, 16 ports, IRQ sharing enabled
> [ 0.766754] console [ttyS0] disabled
> [ 0.790293] serial8250.0: ttyS0 at MMIO 0x18020000 (irq = 11, base_baud = 12500000) is a 16550A
> [ 0.798909] console [ttyS0] enabled
> [ 0.798909] console [ttyS0] enabled
> [ 0.805854] bootconsole [early0] disabled
> [ 0.805854] bootconsole [early0] disabled

So, in propect of an actual bugfix or backport, this boils down to two
questions, which I hope the serial or MIPS maintainers can answer me:

* Is it sane to have two console drivers using the same serial port? In
particular, is it sane for the early console to use the serial port after
serial8250_config_port has reset/configured it, but before the rest of the
setup of uart_configure_port has run? (this would be the case for the
message "serial8250.0: ttyS0 at MMIO...")
* Is it possible to get the serial controller into a state in which
early_printk might wait for THRE forever?

Thanks,
Matthias

Attachment: signature.asc
Description: OpenPGP digital signature