Re: [PATCH 03/16] tty: serial: 8250_core: read only RX if there is something in the FIFO

From: Peter Hurley
Date: Wed Feb 11 2015 - 15:42:26 EST


On 02/11/2015 03:03 PM, Tony Lindgren wrote:
> * Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> [150211 12:05]:
>> On 02/10/2015 12:46 PM, Peter Hurley wrote:
>>> On 02/10/2015 07:04 AM, Nicolas Schichan wrote:
>>>> On 02/10/2015 12:34 AM, Peter Hurley wrote:
>>>>> Hi Nicolas,
>>>>>
>>>>> Thanks for the report.
>>>>>
>>>> [...]
>>>>>> When a caracter is received on the UART while the kernel is printing
>>>>>> the boot messages, as soon as the kernel configures the UART for
>>>>>> receiving (after root filesystem mount), it gets stuck printing the
>>>>>> following message repeatedly:
>>>>>>
>>>>>> serial8250: too much work for irq29
>>>>>>
>>>>>> Once stuck, the reception of another character allows the boot process
>>>>>> to finish.
>>>>>>
>>>>>> From what I can gather, when we hit that, the UART_IIR_NO_INT is 0 (so the
>>>>>> interrupt is raised), but the UART_LSR_DR bit is 0 as well so the UART_RX
>>>>>> register is never read to clear the interrupt.
>>>>>
>>>>> The "too much work" message means serial8250_handle_irq() is returning 0,
>>>>> ie., not handled. Which in turn means IIR indicates no interrupt is pending
>>>>> (UART_IIR_NO_INT == 1).
>>>>>
>>>>> Can you log the register values for LSR and IIR at both patch locations
>>>>> in serial8250_do_startup()?
>>>>>
>>>>> (I can get you a debug patch, if necessary. Let me know)
>>>>
>>>> Hi Peter,
>>>>
>>>> Thanks for your reply.
>>>>
>>>> Here is what I have when the issue is triggered:
>>>>
>>>> [ 12.154877] lsr 0x60 / iir 0x01
>>>> [ 12.158071] lsr 0x60 / iir 0x01
>>>> [ 12.161438] serial8250: too much work for irq29
>>>> [ 12.165982] lsr 0x60 / iir 0x0c
>>>> [ 12.169354] serial8250: too much work for irq29
>>>> [ 12.173900] lsr 0x60 / iir 0x0c
>>>> (previous two messages are repeated and printk_ratelimited())
>>>
>>> Thanks for this information; I see I was wrong about the cause of message.
>>>
>>> I think what happens during startup is that on this silicon clearing
>>> the rx fifo (by serial8250_clear_fifos()) clears data ready but not
>>> the rx timeout condition which causes a spurious rx interrupt when
>>> interrupts are enabled.
>>>
>>> So caught between two broken UARTs: one that underflows its rx fifo because
>>> of unsolicited rx reads and the other that generates spurious interrupt
>>> without unsolicited rx reads.
>>>
>>>
>>>> When the issue is not triggered:
>>>>
>>>> [ 10.784871] lsr 0x60 / iir 0x01
>>>> [ 10.788066] lsr 0x60 / iir 0x01
>>>> [ 10.794734] VFS: Mounted root (nfs filesystem) readonly on device 0:13.
>>>> [ 10.801654] devtmpfs: mounted
>>>> [ 10.805169] Freeing unused kernel memory: 184K (807be000 - 807ec000)
>>>> (userland takes over after that)
>>>>
>>>> I have also displayed the IIR and LSR registers when the "too much fork for
>>>> IRQ" condition is triggered.
>>>>
>>>> In the serial8250_do_startup(), before the interrupt are unmasked at the end,
>>>> the IIR looks sane and UART_IIR_NO_INT bit is set. When stuck
>>>> serial8250_interrupt(), UART_IIR_NO_INT is cleared and the interrupt ID is set
>>>> to 0xc which is not handled by the kernel at this time (the Kirkwood datasheet
>>>> indicates that it is some kind of timeout condition from what I can gather).
>>>
>>> Yes, IIR == UART_IIR_RX_TIMEOUT is to used indicate that data is in the rx fifo
>>> but has not reached the rx trigger level yet.
>>>
>>> ATM, I'm not exactly sure if there is a safe way to clear the spurious interrupt
>>> from the interrupt handler.
>>>
>>> I'm fairly certain the only way to clear the rx timeout interrupt is to read
>>> the rx fifo, but I think this would race with actual data arrival. IOW, there
>>> might not be a way to determine if the data read is spurious or not.
>>
>> Yep, I see no safe way to clear the spurious interrupt [1] and no idea how to
>> keep it from happening (other than via the unsolicited RX reads in
>> serial8250_do_startup).
>>
>> Unfortunately, I think this means we'll have to revert Sebastian's commit:
>>
>> commit 0aa525d11859c1a4d5b78fdc704148e2ae03ae13
>> Author: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
>> Date: Wed Sep 10 21:29:58 2014 +0200
>>
>> tty: serial: 8250_core: read only RX if there is something in the FIFO
>>
>> which just means OMAP3630 will be limited to using the omap_serial driver.
>
> Reverting makes sense to me if it has caused a regression. Maybe Sebastian
> can update his patch to do this based on some quirk flag instead?

That's fine with me. There's a 'bugs' field in struct 8250_uart_port and
UART_BUG_* defines in 8250/8250.h for that purpose.

Regards,
Peter Hurley

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/