Re: Nonterministic hang during bootconsole/console handover on ath79

From: Matthias Schiffer
Date: Wed Mar 23 2016 - 13:41:02 EST

On 03/22/2016 04:38 PM, Peter Hurley wrote:
> On 03/22/2016 06:07 AM, Matthias Schiffer wrote:
>> I've tried your patch and I can't reproduce the issue anymore with it; I
>> have no idea if this actually has to do something with the issue, or the
>> change of the code path just hid the bug again.
>> Regarding your other mail: with "small change", I was not talking about
>> adding an additional printk; as mentioned, even changing the numbers in
>> UTS_VERSION can hide the issue. I diffed a working and a broken kernel
>> image, and the UTS_VERSION is really the only difference. I have no idea
>> how to explain this.
> If _any_ change may hide the problem, that will make it impossible
> to determine if any attempted fix actually works, regardless of what
> debugging method you use.
> FWIW, you could still use the boot console to debug the problem by
> disabling the regular command-line console.
> Regards,
> Peter Hurley

it seems Peter was on the right track. With some help from Ralf, I was able
to narrow down the issue a bit, and I'm fairly sure the hang happens
somewhere in autoconfig().

autoconfig_16550a() is doing all kinds of weird checks to detect different
hardware by writing a lot of register values which are documented as
reserved in the AR7242 datasheet (there's a leaked version going around
that can be easily googled...), no idea if any of those are problematic.
Just setting UPF_FIXED_TYPE as suggested by Peter would avoid that code

That being said, I found another minimal change that seems to fix the
issue: prom_putchar_ar71xx() in arch/mips/ath79/early_printk.c only waits
for UART_LSR_THRE, while serial_putc() in
drivers/tty/serial/8250/8250_early.c waits for (UART_LSR_TEMT |
UART_LSR_THRE). Adjusting arch/mips/ath79/early_printk.c in the same way
makes the hangs go away. Maybe the AR7242 doesn't like its serial config
registers being poked while there's still something in the FIFO? Waiting
for UART_LSR_TEMT seems like a good idea anyways to ensure that all
characters have been printed before autoconfig() starts taking things
apart. (Why do these two versions of essentially the same code exist anyways?)


Attachment: signature.asc
Description: OpenPGP digital signature