Re: [PATCH] usb: dwc3: Potential fix of possible dwc3 interrupt storm

From: Thinh Nguyen
Date: Fri Sep 13 2024 - 14:00:42 EST


On Fri, Sep 13, 2024, Thinh Nguyen wrote:
> On Fri, Sep 13, 2024, Selvarasu Ganesan wrote:
> > Hi Thinh,
> >
> > So far, there have been no reported error instances. But, we suspecting
> > that the issue may be related to our glue driver. In our glue driver, we
> > access the reference of evt->flags when starting or stopping the gadget
> > based on a VBUS notification. We apologize for sharing this information
> > so late, as we only became aware of it recently.
> >
> > The following sequence outlines the possible scenarios of race conditions:
> >
> > Thread#1 (Our glue Driver Sequence)
> > ===================================
> > ->USB VBUS notification
> > ->Start/Stop gadget
> > ->dwc->ev_buf->flags |= BIT(20); (It's for our reference)
> > ->Call dwc3 exynos runtime suspend/resume
> > ->dwc->ev_buf->flags &= ~BIT(20);
> > ->Call dwc3 core runtime suspend/resume
> >
> > Thread#2
> > ========
> > ->dwc3_interrupt()
> > ->evt->flags |= DWC3_EVENT_PENDING;
> > ->dwc3_thread_interrupt()
> > ->evt->flags &= ~DWC3_EVENT_PENDING;
> >
>
> This is great! That's likely the problem. Glad you found it.
>
> >
> >
> > After our internal discussions, we have decided to remove the
> > unnecessary access to evt->flag in our glue driver. We have made these
> > changes and initiated testing.
> >
> > Thank you for your help so far to understand more into our glue driver code.
> >
> > And We are thinking that it would be fine to reset evt->flag when the
> > USB controller is started, along with the changes you suggested earlier.
> > This additional measure will help prevent similar issues from occurring
> > in the future.
> >
> > Please let us know your thoughts on this proposal. If it is not
> > necessary, we understand and will proceed accordingly.
> >
>
> You can submit the change I suggested. That's a valid change. However,
> we should not include the reset of the DWC3_EVENT_PENDING flag. Had we
> done this, you may not found the issue above. It serves no purpose for
> the core driver logic and will be an extra burden for us to maintain. (I
> don't want to scratch my head in the future to figure out why that
> change was needed or concern whether it can be removed without causing
> regression).
>

Also, perhaps you may want to revisit and review the change below to see
if the glue driver may be the culprit:

14e497183df2 ("usb: dwc3: core: Prevent USB core invalid event buffer address access")

Thanks,
Thinh