Re: [PATCH v4 1/1] usb: gadget: add raw-gadget interface

From: Alan Stern
Date: Mon Jan 13 2020 - 13:45:40 EST


On Mon, 13 Jan 2020, Andrey Konovalov wrote:

> On Mon, Jan 13, 2020 at 6:34 PM Andrey Konovalov <andreyknvl@xxxxxxxxxx> wrote:
> >
> > On Mon, Jan 13, 2020 at 5:50 PM Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > On Mon, 13 Jan 2020, Andrey Konovalov wrote:
> > >
> > > > I've also found an issue, but I'm not sure if that is the bug in Raw
> > > > Gadget, or in the gadget layer (in the former case I'll add this fix
> > > > to v5 as well). What I believe I'm seeing is
> > > > __fput()->usb_gadget_unregister_driver()->usb_gadget_remove_driver()->gadget_unbind()
> > > > racing with dummy_timer()->gadget_setup(). In my case it results in
> > > > gadget_unbind() doing set_gadget_data(gadget, NULL), and then
> > > > gadget_setup() dereferencing get_gadget_data(gadget).
> > > >
> > > > Alan, does it look possible for those two functions to race? Should
> > > > this be prevented by the gadget layer, or should I use some kind of
> > > > locking in my gadget driver to prevent this?
> > >
> > > In your situation this race shouldn't happen, because before
> > > udc->driver->unbind() is invoked we call usb_gadget_disconnect(). If
> > > that routine succeeds -- which it always does under dummy-hcd -- then
> > > there can't be any more setup callbacks, because find_endpoint() will
> > > always return NULL (the is_active() test fails; see the various
> > > set_link_state* routines). So I don't see how you could have ended up
> > > with the race you describe.
> >
> > I've managed to reproduce the race by adding an mdelay() into the
> > beginning of the setup() callback. AFAIU what happens is setup() gets
> > called (and waits on the mdelay()), then unbind() comes in and does
> > set_gadget_data(NULL), and then setup() proceeds, gets NULL through
> > get_gadget_data() and crashes on null-ptr-deref. I've got the same
> > crash a few times after many days of fuzzing, so I assume it can
> > happen without the mdelay() as well.
> >
> > > However, a real UDC might not be able to perform a disconnect under
> > > software control. In that case usb_gadget_disconnect() would not
> > > change the pullup state, and there would be a real possibility of a
> > > setup callback racing with an unbind callback. This seems like a
> > > genuine problem and I can't think of a solution offhand.
> > >
> > > What we would need is a way to tell the UDC driver to stop invoking
> > > gadget callbacks, _before_ the UDC driver's stop callback gets called.
> > > Maybe this should be merged into the pullup callback somehow.
>
> Perhaps for the dummy driver we need to wait for setup() to finish if
> it's being executed and then stop the dummy timer in dummy_pullup()?

Yes, we need to wait for a setup callback to finish. But dummy_timer
should not be stopped; otherwise URBs that have already been submitted
would never be given back.

The big problem is that usb_gadget_disconnect() can be called in
interrupt context. In general, a UDC driver will need to call
synchronize_irq() to make sure there aren't any callbacks still
running, and that can be done only in process context. dummy-hcd is
slightly different since it doesn't manage actual hardware; it calls
usleep_range() instead of synchronize_irq() -- but that also requires
process context.

We could change the gadget API to require that usb_gadget_disconnect()
and related routines always be called in process context. I don't know
if that's such a good idea though. A gadget driver might want to
disconnect from within its setup handler or a completion routine, for
example.

A better approach would be to add a new synchronize_callbacks()
pointer in the usb_gadget_ops structure. But to work properly it
would have to be mandatory for every UDC driver, and that's not so easy
to accomplish.

Suggestions?

Alan Stern