RE: [PATCH] usb: dwc3: Add dwc3 lock for blocking interrupt storming

From: 정재훈
Date: Thu Mar 10 2022 - 21:43:26 EST


> -----Original Message-----
> From: Thinh Nguyen [mailto:Thinh.Nguyen@xxxxxxxxxxxx]
> Sent: Friday, March 11, 2022 10:57 AM
> To: 정재훈; Thinh Nguyen; 'Felipe Balbi'; 'Greg Kroah-Hartman'
> Cc: 'open list:USB XHCI DRIVER'; 'open list'; 'Seungchull Suh'; 'Daehwan
> Jung'; cpgs@xxxxxxxxxxx; cpgsproxy5@xxxxxxxxxxx
> Subject: Re: [PATCH] usb: dwc3: Add dwc3 lock for blocking interrupt
> storming
>
> 정재훈 wrote:
> > Hi.
> >
> >> -----Original Message-----
> >> From: Thinh Nguyen [mailto:Thinh.Nguyen@xxxxxxxxxxxx]
> >> Sent: Thursday, March 10, 2022 11:14 AM
> >> To: JaeHun Jung; Felipe Balbi; Greg Kroah-Hartman
> >> Cc: open list:USB XHCI DRIVER; open list; Seungchull Suh; Daehwan
> >> Jung
> >> Subject: Re: [PATCH] usb: dwc3: Add dwc3 lock for blocking interrupt
> >> storming
> >>
> >> Hi,
> >>
> >> JaeHun Jung wrote:
> >>> Interrupt Storming occurred with a very low probability of occurrence.
> >>> The occurrence of the problem is estimated to be caused by a race
> >>> condition between the top half and bottom half of the interrupt
> >>> service
> >> routine.
> >>> It was confirmed that variables have values that cannot be held when
> >>> ISR occurs through normal H / W irq.
> >>> ====================================================================
> >>> = (struct dwc3_event_buffer *) ev_buf = 0xFFFFFF88DE6A0380 (
> >>> (void *) buf = 0xFFFFFFC01594E000,
> >>> (void *) cache = 0xFFFFFF88DDC14080,
> >>> (unsigned int) length = 4096,
> >>> (unsigned int) lpos = 0,
> >>> (unsigned int) count = 0, <<
> >>> (unsigned int) flags = 1, <<
> >>> ====================================================================
> >>> = "evt->count=0" and "evt->flags=DWC3_EVENT_PENDING" cannot be set
> >>> at the same time.
> >>>
> >>> We estimate that a race condition occurred between dwc3_interrupt()
> >>> and dwc3_process_event_buf() called by
> >>> dwc3_gadget_process_pending_events().
> >>> So I try to block the race condition through spin_lock.
> >>
> >> This looks like it needs a memory barrier. Would this work for you?
> > Maybe it could be. But "evt->count = 0;" is updated on
> dwc3_process_event_buf().
> > So, I think spin_lock is more clear routine for this issue.
> >
>
> Not really. If problem is due to the evt->flags not updated in time, then
> the solution should be using the memory barrier. The spin_lock would
> obfuscate the issue. And we should avoid using spin_lock in the top-half.

This issue was occurred by watchdog. The interrupt occurred in units of 4 to 5us and cannot be released until the bottom is executed.
If it is a problem with the memory barrier, the value should be updated after a few clocks and the TOP should run normally. Isn't it?
And Could you explain me why we should avoid using spin_lock in the top-half.

>
> BR,
> Thinh
>
> >>
> >> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> >> index
> >> c02e239978e0..a96c344b9f17 100644
> >> --- a/drivers/usb/dwc3/gadget.c
> >> +++ b/drivers/usb/dwc3/gadget.c
> >> @@ -5340,6 +5340,9 @@ static irqreturn_t dwc3_check_event_buf(struct
> >> dwc3_event_buffer *evt)
> >> return IRQ_HANDLED;
> >> }
> >>
> >> + /* Make sure the event flags is updated */
> >> + wmb();
> >> +
> >> /*
> >> * With PCIe legacy interrupt, test shows that top-half irq
> >> handler can
> >> * be called again after HW interrupt deassertion. Check if
> >> bottom- half
> >>
> >>
> >> Thanks,
> >> Thinh
> >