Re: [PATCH] serial: qcom_geni_serial: Fix recent kdb hang

From: Akash Asthana
Date: Mon Aug 10 2020 - 08:32:33 EST


Hi Doug,

On 8/7/2020 10:49 AM, Douglas Anderson wrote:
The commit e42d6c3ec0c7 ("serial: qcom_geni_serial: Make kgdb work
even if UART isn't console") worked pretty well and I've been doing a
lot of debugging with it. However, recently I typed "dmesg" in kdb
and then held the space key down to scroll through the pagination. My
device hung. This was repeatable and I found that it was introduced
with the aforementioned commit.

It turns out that there are some strange boundary cases in geni where
in some weird situations it will signal RX_LAST but then will put 0 in
RX_LAST_BYTE. This means that the entire last FIFO entry is valid.

IMO that means we received a word in RX_FIFO and it is the last word hence RX_LAST bit is set.

RX_LAST_BYTE is 0 means none of the bytes are valid in the last word.

In such scenario we should just read RX_FIFO buffer (to empty it), discard the word and return NO_POLL_CHAR. Something like below.

---------------------------------------------------------------------------------------------------------------------------------------------------------

                else
                        private_data->poll_cached_bytes_cnt = 4;

                private_data->poll_cached_bytes =
                        readl(uport->membase + SE_GENI_RX_FIFOn);
        }

+        if (!private_data->poll_cached_bytes_cnt)
+              return NO_POLL_CHAR;
        private_data->poll_cached_bytes_cnt--;
        ret = private_data->poll_cached_bytes & 0xff;
-------------------------------------------------------------------------------------------------------------------------------------------------------------

Please let me know whether above code helps.

I am not sure about what all scenario can leads to this behavior from hardware, I will try to get an answer from hardware team.

Any error bit was set for SE_GENI_S_IRQ_STATUS & SE_GENI_M_IRQ_STATUS registers?


I guess the hang was seen because *poll_cached_bytes_cnt* is unsigned int and it's value was 0, when it's decremented by 1 it's value become '4294967295' (very large) and dummy RX (0x00) would happen that

many times before reading any actual RX transfers/bytes.

Regards,

Akash


This weird corner case is handled in qcom_geni_serial_handle_rx()
where you can see that we only honor RX_LAST_BYTE if RX_LAST is set
_and_ RX_LAST_BYTE is non-zero. If either of these is not true we use
BYTES_PER_FIFO_WORD (4) for the size of the last FIFO word.

Let's fix kgdb. While at it, also use the proper #define for 4.

Fixes: e42d6c3ec0c7 ("serial: qcom_geni_serial: Make kgdb work even if UART isn't console")
Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx>
---

drivers/tty/serial/qcom_geni_serial.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index 07b7b6b05b8b..e27077656939 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -361,11 +361,16 @@ static int qcom_geni_serial_get_char(struct uart_port *uport)
return NO_POLL_CHAR;
if (word_cnt == 1 && (status & RX_LAST))
+ /*
+ * NOTE: If RX_LAST_BYTE_VALID is 0 it needs to be
+ * treated as if it was BYTES_PER_FIFO_WORD.
+ */
private_data->poll_cached_bytes_cnt =
(status & RX_LAST_BYTE_VALID_MSK) >>
RX_LAST_BYTE_VALID_SHFT;
- else
- private_data->poll_cached_bytes_cnt = 4;
+
+ if (private_data->poll_cached_bytes_cnt == 0)
+ private_data->poll_cached_bytes_cnt = BYTES_PER_FIFO_WORD;
private_data->poll_cached_bytes =
readl(uport->membase + SE_GENI_RX_FIFOn);

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,\na Linux Foundation Collaborative Project