Re: [PATCH] usb: dwc3: fix race of usb_gadget_driver operation

From: Alan Stern
Date: Mon Jun 28 2021 - 10:10:34 EST


On Mon, Jun 28, 2021 at 05:36:22PM +0800, linyyuan@xxxxxxxxxxxxxx wrote:
> On 2021-06-27 22:09, Alan Stern wrote:
> >
> > CPU0 CPU1
> > ---- ----
> > usb_gadget_remove_driver runs
> > Calls synchronize_irq
> > synchronize_irq returns
> > Calls udc_driver_unbind
> > IRQ happens for disconnect
> > Handler unlocks dwc->lock
> > Calls dwc->gadget_driver->disconnect
> > Gadget driver has already been unbound
> > and is not prepared to handle a
> > callback, so it crashes
> > Calls usb_gadget_udc_stop
> > dwc->gadget_driver is
> > set to NULL
> >
> > Without the async_callbacks mechanism, the gadget driver can get a
> > callback at the wrong time (after it has been unbound), which might
> > cause it to crash.
> 1. do you think we need to back to my original patch,
> https://lore.kernel.org/linux-usb/20210619154309.52127-1-linyyuan@xxxxxxxxxxxxxx/T/#t

No, I think the async_callbacks approach is slightly better.

> i think we can add the spin lock or mutex lock to protect this kind of race
> from UDC layer, it will be easy understanding.

We don't actually need a lock or mutex to fix this problem. We just
need to make the remove_driver sequence issue two calls to the UDC
driver rather than just one: async_callbacks and udc_stop.

> 2. if you insist this kind of change, how to change following code in dwc3 ?
> if (dwc->gadget_driver && dwc->gadget_driver->disconnect) {
>
> 2.1 if (dwc->async_callbacks && dwc->gadget_driver->disconnect) {
> or
> 2.2 if (dwc->async_callbacks && vdwc->gadget_driver &&
> dwc->gadget_driver->disconnect) {

Either one would be okay. If async_callbacks is enabled then
dwc->gadget_driver should never be NULL, but it won't hurt to check.
After all, disconnect does not occur often and it doesn't need to run
extremely quickly.

Alan Stern