Re: [PATCH v2] serial: core: only stop transmit when HW fifo is empty

From: Doug Brown
Date: Fri May 17 2024 - 00:22:30 EST


Hello,

On 3/3/2024 7:08 AM, Jonas Gorski wrote:
If the circular buffer is empty, it just means we fit all characters to
send into the HW fifo, but not that the hardware finished transmitting
them.

So if we immediately call stop_tx() after that, this may abort any
pending characters in the HW fifo, and cause dropped characters on the
console.

Fix this by only stopping tx when the tx HW fifo is actually empty.

Fixes: 8275b48b2780 ("tty: serial: introduce transmit helpers")
Cc: stable@xxxxxxxxxxxxxxx
Signed-off-by: Jonas Gorski <jonas.gorski@xxxxxxxxx>
---
(this is v2 of the bcm63xx-uart fix attempt)

v1 -> v2
* replace workaround with fix for core issue
* add Cc: for stable

I'm somewhat confident this is the core issue causing the broken output
with bcm63xx-uart, and there is no actual need for the UART_TX_NOSTOP.

I wouldn't be surprised if this also fixes mxs-uart for which
UART_TX_NOSTOP was introduced.

If it does, there is no need for the flag anymore.
include/linux/serial_core.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
index 55b1f3ba48ac..bb0f2d4ac62f 100644
--- a/include/linux/serial_core.h
+++ b/include/linux/serial_core.h
@@ -786,7 +786,8 @@ enum UART_TX_FLAGS {
if (pending < WAKEUP_CHARS) { \
uart_write_wakeup(__port); \
\
- if (!((flags) & UART_TX_NOSTOP) && pending == 0) \
+ if (!((flags) & UART_TX_NOSTOP) && pending == 0 && \
+ __port->ops->tx_empty(__port)) \
__port->ops->stop_tx(__port); \
} \
\

I just upgraded to kernel 6.9 and discovered through a git bisect that
this patch (7bfb915a597a301abb892f620fe5c283a9fdbd77) causes a problem
with the legacy pxa.c serial driver (CONFIG_SERIAL_PXA_NON8250). I'm
using it with a PXA168-based ARM device for a serial console as well as
getty. With this patch applied, transmissions get hung up before they
finish. The data isn't lost, because the next time a transmit occurs,
the delayed data finally goes out -- but something seems to be causing
it to get stuck right at the end of many, but not all, transmissions.
For example, if I type "ps" and hit enter, nothing shows up until I hit
enter again, which finally kickstarts the whole TX process and then I
get all of the queued ps output.

I'm really confused about this symptom because it seems at face value
like this patch would only ever improve the situation by preventing
stop_tx() from being called too early. There's something about the pxa
driver that is happier when stop_tx() is called with an empty buffer
even if the UART is reporting that it's not empty yet. I tested some
other random systems in qemu and couldn't reproduce this issue, so the
problem may very well be limited just to this driver/hardware...

I realize this driver is old and deprecated (I'm likely one of the few
users left of it) so I'm hesitant to call it a regression. Maybe it's
really a bug in this driver that the new patch exposes? I even thought,
"heck, I should probably be using the newer 8250_pxa driver instead",
but that one is even worse -- it drops TX characters like crazy,
regardless of whether this patch is applied. I want to look into that
problem eventually.

I'm hoping there is some kind of simple fix that can be made to the pxa
driver to work around it with this new behavior. Can anyone think of a
reason that this driver would not like this change? It seems
counterintuitive to me -- the patch makes perfect sense.

Thanks,
Doug