Re: [PATCH v2 3/3] serial: qcom-geni: do not kill the machine on fifo underrun
From: Johan Hovold
Date: Tue Jul 09 2024 - 05:44:21 EST
On Mon, Jul 08, 2024 at 04:59:59PM -0700, Doug Anderson wrote:
> On Thu, Jul 4, 2024 at 3:19 AM Johan Hovold <johan+linaro@xxxxxxxxxx> wrote:
> >
> > The Qualcomm GENI serial driver did not handle buffer flushing and used
> > to print discarded characters when the circular buffer was cleared.
> > Since commit 1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo")
> > this instead resulted in a hard lockup due to
> > qcom_geni_serial_send_chunk_fifo() spinning indefinitely in the
> > interrupt handler.
> >
> > The underlying bugs have now been fixed, but make sure to output NUL
> > characters instead of killing the machine if a similar driver bug is
> > ever reintroduced.
> >
> > Signed-off-by: Johan Hovold <johan+linaro@xxxxxxxxxx>
> > ---
> > drivers/tty/serial/qcom_geni_serial.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
> > index b2bbd2d79dbb..69a632fefc41 100644
> > --- a/drivers/tty/serial/qcom_geni_serial.c
> > +++ b/drivers/tty/serial/qcom_geni_serial.c
> > @@ -878,7 +878,7 @@ static void qcom_geni_serial_send_chunk_fifo(struct uart_port *uport,
> > memset(buf, 0, sizeof(buf));
> > tx_bytes = min(remaining, BYTES_PER_FIFO_WORD);
> >
> > - tx_bytes = uart_fifo_out(uport, buf, tx_bytes);
> > + uart_fifo_out(uport, buf, tx_bytes);
>
> FWIW I would have rather we output something much more obviously wrong
> in this case instead of a NUL byte. Maybe we should fill it with "@"
> characters or something? As you said: the driver shouldn't get into
> this error condition so it shouldn't matter, but if we have a bug in
> the future I'd rather it be an obvious bug instead of a subtle bug.
Yeah, I've been running with a patch like that locally in my tests, and
went a bit back and forth whether I should post it. My reasoning for not
doing so was that the bugs have been fixed so we don't need to spend
cycles on memsetting the buffer to anything but NUL (I used 'X' in my
testing).
I guess that can be avoided by only padding the buffer if we ever hit an
underrun, but I still thinks it's questionable to spend the effort as
this is not something that should be needed. In any case, I didn't want
to spend time on it to fix the 6.10 regressions.
Killing the machine is perhaps an effective way to get attention to an
issue, but I'd much rather have an occasional NUL character in the log
*if* this ever becomes an issue at all again.
> I'm happy to post a patch or provide a Reviewed-by if you want to post
> a patch. Let me know.
If you feel strongly about this, I can either fill the buffer with
something else than NUL or add error handling for any such future
hypothetical bugs. What do you prefer?
Johan