Re: [PATCHv2] musb_host: fix lockup on rxcsr_h_error

From: Maxim Uvarov
Date: Thu Apr 28 2016 - 02:51:43 EST


2016-04-28 0:28 GMT+03:00 Bin Liu <b-liu@xxxxxx>:
> Hi,
>
> On Wed, Apr 27, 2016 at 02:13:56PM -0500, Bin Liu wrote:
>> Hi,
>>
>> On Wed, Apr 27, 2016 at 09:26:10PM +0300, Maxim Uvarov wrote:
>> > 2016-04-27 18:46 GMT+03:00 Bin Liu <b-liu@xxxxxx>:
>> > > Hi,
>> > >
>> > > On Wed, Apr 27, 2016 at 09:51:58AM +0300, Max Uvarov wrote:
>> > >> Fix soft lockup when resetting remote device attached
>> > >> to usb host. Configuration:
>> > >> pppd -> musb hub -> usb-serial -> gsm modem
>> > >
>> > > I have heard a few reports similar to this symptom, but never been able
>> > > to reproduce it on my side.
>> > >
>> >
>> > Ok, I can reproduce it almost very easy.
>> >
>> > >> When gsm modem resets, musb rolls in incoming rx interrupts
>> > >> which does not give any time to other application as result
>> > >> it totally lock ups. Solution is to keep original logic for RXCSR_H_ERROR
>> > >
>> > > Have you looked where exact place in the interrupt routine the execution
>> > > has stuck in?
>> > >
>> >
>> > It does not stuck. It goes to that line which print proto error over
>> > and over again and
>> > nothing stops that. After some time kernel reports lockup. But
>> > actually it's not stuck,
>> > all cpu time was eaten by executing that handlers.
>> >
>> >
>> > >> and merge RXCSR_DATAERROR and RXCSR_H_ERROR branches to call same code
>> > >> for setting rx stall with MUSB_RXCSR_H_WZC_BITS.
>> > >
>> > > MUSB_RXCSR_H_WZC_BITS itself does not set rx stall, it just ensures
>> > > MUSB_RXCSR_H_RXSTALL not to be cleared. Please check its comment in
>> > > musb_regs.h.
>> > >
>> > >>
>> > >> Signed-off-by: Max Uvarov <muvarov@xxxxxxxxx>
>> > >> ---
>> > >> v2: use bitwise or for error flags before logical and. (Sergei Shtylyov).
>> > >>
>> > >> drivers/usb/musb/musb_host.c | 12 +++++-------
>> > >> 1 file changed, 5 insertions(+), 7 deletions(-)
>> > >>
>> > >> diff --git a/drivers/usb/musb/musb_host.c b/drivers/usb/musb/musb_host.c
>> > >> index c3d5fc9..2d9aa78 100644
>> > >> --- a/drivers/usb/musb/musb_host.c
>> > >> +++ b/drivers/usb/musb/musb_host.c
>> > >> @@ -1592,14 +1592,12 @@ void musb_host_rx(struct musb *musb, u8 epnum)
>> > >
>> > > What kernel do you use? This line # is away off from upstream kernel.
>> > >
>> >
>> > I did this patch for 4.1 but 4.6 has the same problem and patch
>> > cleanly applies to the latest torvalds/linux.git v4.6-rc5. This
>> > interrupt handler has the same code. And looks like on 3.14
>>
>> Yeah, this code hasn't been chaned for year. But in general, it is
>> prepfered to create patches on latest kernel to avoid other headache.
>>
>> > everything worked. I don't have a time to diff 2 versions. Might be
>> > regression.
>> >
>> >
>> > >>
>> > >> /* stall; record URB status */
>> > >> status = -EPIPE;
>> > >> + } else if (rx_csr & (MUSB_RXCSR_DATAERROR | MUSB_RXCSR_H_ERROR)) {
>> > >>
>> > >> - } else if (rx_csr & MUSB_RXCSR_H_ERROR) {
>> > >> - dev_dbg(musb->controller, "end %d RX proto error\n", epnum);
>> > >> -
>> > >> - status = -EPROTO;
>> > >> - musb_writeb(epio, MUSB_RXINTERVAL, 0);
>> > >> -
>> > >> - } else if (rx_csr & MUSB_RXCSR_DATAERROR) {
>> > >> + if (rx_csr & MUSB_RXCSR_H_ERROR) {
>> > >> + status = -EPROTO;
>> > >> + musb_writeb(epio, MUSB_RXINTERVAL, 0);
>> > >> + }
>> > >
>> > > Please help me to understand how this change fixes the issue. I see the
>> > > most effect of the change here is directly 'goto finish' so that 'done'
>> > > flag is not set, then musb_advance_schedule() is not called. Is this the
>> > > case or I missed other important pieces?
>> > >
>> >
>> > Right that is the goal. On this rxcsr_h_error kernel reschedules
>> > current interrupt. And that continues forever. For example adding
>>
>> The MUSB Programming Guide says CPU should clear this MUSB_RXCSR_H_ERROR
>> bit, but the current driver doesn't. I am wondering if this causes the
>> controller keeps generating the same interrupt. Can you please try the
>> following change instead to see if the lockup goes away?
>>
>> @@ -1870,6 +1870,9 @@ void musb_host_rx(struct musb *musb, u8 epnum)
>> status = -EPROTO;
>> musb_writeb(epio, MUSB_RXINTERVAL, 0);
>>
>> + rx_csr &= ~MUSB_RXCSR_H_ERROR;
>> + musb_writew(epio, MUSB_RXCSR, rx_csr);
>
> + goto finish;
>
> Please also add the line above. I will spend more time to understand
> what is happening...
>

Hello Bin,

yes, it also works with that reset and go to finish:

diff --git a/drivers/usb/musb/musb_host.c b/drivers/usb/musb/musb_host.c
index c3d5fc9..8cd98e7 100644
--- a/drivers/usb/musb/musb_host.c
+++ b/drivers/usb/musb/musb_host.c
@@ -1599,6 +1599,10 @@ void musb_host_rx(struct musb *musb, u8 epnum)
status = -EPROTO;
musb_writeb(epio, MUSB_RXINTERVAL, 0);

+ rx_csr &= ~MUSB_RXCSR_H_ERROR;
+ musb_writew(epio, MUSB_RXCSR, rx_csr);
+
+ goto finish;
} else if (rx_csr & MUSB_RXCSR_DATAERROR) {

if (USB_ENDPOINT_XFER_ISOC != qh->type) {


That I think a key thing, which is done in other error. If that change
is good for you than I'm also happy with it.

I also not sure if musb_writeb(epio, MUSB_RXINTERVAL, 0); is needed.
In my case it's the same result with it and without it.
In other scenarios might be reasonable...


> First of all, I don't like the idea of merging the two branches, it
> makes the code ugly.

Yes, I don't like that function at all, it's too long and difficult to
read if you first look on it first time. It will be good to split it
on 3 small functions for each big if.

Maxim.

>
> Regards,
> -Bin.
>
>> +
>> } else if (rx_csr & MUSB_RXCSR_DATAERROR) {
>>
>> if (USB_ENDPOINT_XFER_ISOC != qh->type) {
>>
>> Regards,
>> -Bin.
>>
>> > msleep() can give some time for other processes. I'm not an expert in
>> > this chip but I think that right solution in that case is not try to
>> > reschedule and quick and allow hub to make reset and once again init
>> > all devices (in my case ppp/pppd also shutdowns and then I bring
>> > everything up with script.). The same behavior with dma and pio mode.
>> >
>> > Regards,
>> > Max.
>> >
>> > > Thanks,
>> > > -Bin.
>> > >
>> > >>
>> > >> if (USB_ENDPOINT_XFER_ISOC != qh->type) {
>> > >> dev_dbg(musb->controller, "RX end %d NAK timeout\n", epnum);
>> > >> --
>> > >> 1.9.1
>> > >>
>> > >> --
>> > >> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
>> > >> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> > >> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> >
>> >
>> >
>> > --
>> > Best regards,
>> > Maxim Uvarov



--
Best regards,
Maxim Uvarov