Re: Nonterministic hang during bootconsole/console handover on ath79
From: Matthias Schiffer
Date: Tue Mar 22 2016 - 09:08:09 EST
>> My theory is the following:
>>
>> As soon as ttyS0 is detected and installed as the console, there are two
>> console drivers active on the serial port at the same time: early0 and
>> ttyS0. I suspect that the hang occurs when the primitive early0
>> implementation prom_putchar_ar71xx waits indefinitely on THRE, but the real
>> driver has just reset the serial controller in a way that makes THRE never
>> come.
>
> Doubtful.
>
> console writes are performed with ints disabled, as is the 8250 driver's
> autoconfig probing. Since this is a UP platform, as long as you're not
> using the DEBUG_AUTOCONF switch in the 8250 driver, I don't think there's
> a way for the boot console to be outputting while the 8250 driver is
> configuring.
I see.
>
>> When the boot is successful, I also sometimes see just garbage
>> instead of the message "serial8250.0: ttyS0 at MMIO 0x18020000...", which
>> supports my idea that the kernel is trying to use the serial console while
>> it is not correctly setup.
>>
>
> I wonder if autoconfig probing (that's what discovers the uart port type)
> is broken.
>
> You could test this hypothesis by setting the port type directly and
> set UPF_FIXED_TYPE; ie., in arch/mips/ath79/dev-common.c
>
> diff --git a/arch/mips/ath79/dev-common.c b/arch/mips/ath79/dev-common.c
> index 516225d..3814a42 100644
> --- a/arch/mips/ath79/dev-common.c
> +++ b/arch/mips/ath79/dev-common.c
> @@ -36,7 +36,8 @@ static struct plat_serial8250_port ath79_uart_data[] = {
> {
> .mapbase = AR71XX_UART_BASE,
> .irq = ATH79_MISC_IRQ(3),
> - .flags = AR71XX_UART_FLAGS,
> + .flags = AR71XX_UART_FLAGS | UPF_FIXED_TYPE,
> + .type = PORT_16550A,
> .iotype = UPIO_MEM32,
> .regshift = 2,
> }, {
>
>
> Regards,
> Peter Hurley
I've tried your patch and I can't reproduce the issue anymore with it; I
have no idea if this actually has to do something with the issue, or the
change of the code path just hid the bug again.
Regarding your other mail: with "small change", I was not talking about
adding an additional printk; as mentioned, even changing the numbers in
UTS_VERSION can hide the issue. I diffed a working and a broken kernel
image, and the UTS_VERSION is really the only difference. I have no idea
how to explain this. (OpenWrt uses an LZMA-compressed kernel image, so
after compression, the differences are much greater; but how these
differences would affect the kernel after decompression eludes me)
I'll continue searching for a board with accessible JTAG which exhibits
this issue. Given the heisenbuggy nature of the issue, getting to the root
cause is probably impossible without JTAG unless someone has an obvious
explanation...
Thanks,
Matthias
Attachment:
signature.asc
Description: OpenPGP digital signature