Re: [PATCH] usb: dwc3: fix race of usb_gadget_driver operation

From: linyyuan
Date: Sat Jun 26 2021 - 22:49:07 EST


On 2021-06-26 23:03, Alan Stern wrote:
On Sat, Jun 26, 2021 at 09:16:25AM +0800, linyyuan@xxxxxxxxxxxxxx wrote:
On 2021-06-26 00:37, Alan Stern wrote:
> On Fri, Jun 25, 2021 at 06:44:15PM +0800, Linyu Yuan wrote:

> > --- a/drivers/usb/dwc3/ep0.c
> > +++ b/drivers/usb/dwc3/ep0.c
> > @@ -597,10 +597,11 @@ static int dwc3_ep0_set_address(struct dwc3
> > *dwc, struct usb_ctrlrequest *ctrl)
> >
> > static int dwc3_ep0_delegate_req(struct dwc3 *dwc, struct
> > usb_ctrlrequest *ctrl)
> > {
> > - int ret;
> > + int ret = 0;
> >
> > spin_unlock(&dwc->lock);
> > - ret = dwc->gadget_driver->setup(dwc->gadget, ctrl);
> > + if (dwc->async_callbacks)
> > + ret = dwc->gadget_driver->setup(dwc->gadget, ctrl);
> > spin_lock(&dwc->lock);
>
> Here and in the other places, you should test dwc->async_callbacks
> _before_ dropping the spinlock. Otherwise there is a race (the flag
> could be written at about the same time it is checked).
thanks for your comments,

if you think there is race here, how to make sure gadget_driver pointer is
safe,
this is closest place where we can confirm it is non-NULL by checking
async_callbacks ?

I explained this twice already: We know that gadget_driver is not
NULL because usb_gadget_remove_driver calls synchronize_irq before
doing usb_gadget_udc_stop.

Look at this timing diagram:

CPU0 CPU1
---- ----
IRQ happens for setup packet
Handler sees async_callbacks
is enabled
Handler unlocks dwc->lock
usb_gadget_remove_driver runs
Disables async callbacks
Calls synchronize_irq
Handler calls dwc-> . waits for IRQ handler to
gadget_driver->setup . return
Handler locks dwc-lock .
... .
Handler returns .
. synchronize_irq returns
Calls usb_gadget_udc_stop
dwc->gadget_driver is
set to NULL

As you can see, dwc->gadget_driver is non-NULL when CPU0 uses it,
even though async_callbacks gets cleared during the time when the
lock is released.
thanks for your patient explanation,
but from this part, seem it is synchronize_irq() help to avoid NULL pointer crash.

can you also explain how async_callbacks flag help here ?

Alan Stern