Re: [PATCH] usb: dwc3: ep0: Fix the possible missed request for handling delay STATUS phase
From: Felipe Balbi
Date: Mon Jan 16 2017 - 14:22:27 EST
Hi,
Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> writes:
> On Mon, 16 Jan 2017, Felipe Balbi wrote:
>
>> >>>> Another point here is that the really robust way of fixing this is to
>> >>>> get rid of USB_GADGET_DELAYED_STATUS altogether and just make sure
>> >>>> gadget drivers know how to queue requests for all three phases of a
>> >>>> Control Transfer.
>> >>>>
>> >>>> A lot of code will be removed from all gadget drivers and UDC drivers
>> >>>> while combining all of it in a single place in composite.c.
>
> Don't forget the legacy drivers.
right. I think EHCI Debug gadget is the only one not using composite.c
though. All others under drivers/usb/gadget/legacy are static
configurations of existing function drivers and all use composite.c
>> >>>> The reason I'm saying this is that other UDC drivers might have similar
>> >>>> races already but they just haven't triggered.
>
>> >> Alright, it's important enough to fix this bug. Can you also have a look
>> >> into dropping USB_GADGET_DELAYED_STATUS altogether? If you're too busy,
>> >> no issues. It'll stay in my queue.
>> >
>> > Okay, I will have a look at f_mass_storage driver to see if we can
>> > drop USB_GADGET_DELAYED_STATUS. Thanks.
>>
>> not only mass storage. It needs to be done for all drivers. The way to
>> do that is to teach functions that control transfers are composed of two
>> or three phases. If you look at UDC drivers today, they all have
>> peculiarities about control transfers to handle stuff that *maybe*
>> gadget drivers won't handle.
>>
>> What we should do here is make sure that *all* 3 phases always have a
>> matching usb_ep_queue() coming from the upper layers. Whether
>> composite.c or f_*.c handles it, that's an implementation detail. But
>> just to illustrate the problem, we should be able to get rid of
>> dwc3_ep0_out_start() and assume that the upper layer will call
>> usb_ep_queue() when it wants to receive a new SETUP packet.
>>
>> Likewise, we should be able to assume that STATUS phase will only start
>> based on a usb_ep_queue() call. That way we can remove
>> USB_GADGET_DELAYED_STATUS altogether, because that will *always* be the
>> case. There will be no races to handle apart from the normal case where
>> XferNotReady can come before or after usb_ep_queue(), but we already
>> have proper handling for that too.
>
> It sounds like you're talking about a major change in the way the
> gadget subsystem handles control requests.
yeah, not the first time :-)
> We can distinguish three cases. In the existing implementation, they
> work like this:
>
> (1) Control-OUT with no data stage. The gadget driver's setup
> routine either queues a request on ep0, which the UDC driver
> uses for the status stage transfer (so it should be a length-0
> IN transfer) and returns 0, or else returns an error, in which
> case the UDC driver sends a protocol STALL for the status
> stage.
>
> (What the UDC driver should do if the setup routine queues a
> request on ep0 and then returns an error is undefined.)
correct
> (2) Control-OUT with a data stage. The gadget driver's setup
> routine either queues an OUT request on ep0, which the UDC
> driver uses for the data stage transfer, or else returns an
> error, in which case the UDC driver sends a protocol STALL for
> the data stage. In the first case, the UDC driver
> automatically queues a 0-length IN request for the status
> stage; the gadget driver does not get any chance to fail the
> transfer after the host's data has been successfully received.
> (IMO this is a bug in the design of the gadget subsystem.)
exactly, what I'm proposing here would let us fix this detail, too.
> (3) Control-IN with a data stage. The gadget driver's setup
> routine either queues an IN request on ep0, which the UDC
> driver uses for the data stage transfer, or else returns an
> error, in which case the UDC driver sends a protocol STALL for
> the data stage. In the first case, the UDC driver
> automatically queues a 0-length OUT request for the status
> stage; the gadget driver does not get any chance to fail the
> transfer after its data has been successfully sent (and I can't
> think of any reason for doing this).
right
> In the delayed-status or delayed-data case, the setup routine does not
> queue a request on ep0 before returning 0; instead the gadget driver
> queues this request at a later time in a separate thread.
>
> The gadget driver never calls usb_ep_queue in order to receive the next
> SETUP packet; the UDC driver takes care of SETUP handling
> automatically.
yeah, that's another thing I'd like to change. Currently, we have no
means to either try to implement device-initiated LPM without adding a
ton of hacks to UDC drivers. If we require upper layers (composite.c,
most of the time) to usb_ep_queue() separate requests for all 3 phases
of a ctrl transfer, we can actually rely on the fact that a new SETUP
phase hasn't been queued yet to trigger U3 entry.
Another detail that this helps is that PM (overall) becomes simpler as,
most likely, we won't need to mess with transfer cancellation, for
example.
> You are suggesting that status stage requests should not be queued
> automatically by UDC drivers but instead queued explicitly by gadget
> drivers. This would mean changing every UDC driver and every gadget
> driver.
yes, a bit of work but has been done before. One example that comes to
mind is when I added ->udc_start() and ->udc_stop(). It's totally
doable. We can, for instance, add a temporary
"wants_explicit_ctrl_phases" flag to struct usb_gadget which, if set,
will tell composite.c (or whatever) that the UDC wants explicitly queued
ctrl phases.
Then add support for that to each UDC and set the flag. Once all are
converted, add one extra patch to remove the flag and the legacy
code. This has, of course, the draw back of increasing complexity until
everything is converted over; but if it's all done in a single series, I
can't see any problems with that.
> Also, it won't fix the race that Baolin Wang found. The setup routine
well, it will help... see below.
> is always called in interrupt context, so it can't sleep. Doing
> anything non-trivial will require a separate task, and it's possible
> that this task will try to enqueue the data-stage or status-stage
> request before the UDC driver is ready to handle it (for example,
> before or shortly after the setup routine returns).
>
> To work properly, the UDC driver must be able to accept a request for
> ep0 any time after it invokes the setup callback -- either before the
> callback returns or after.
Right, all UDCs are *already* required to support this case anyway
because of USB_GADGET_DELAYED_STATUS. There was a bug in DWC3, sure, but
it was already required to support this case.
By removing USB_GADGET_DELAYED_STATUS altogether and making phases more
explict, we enforce this requirement and it'll be much easier to test
for it IMO.
>It seems that this was the real problem Baolin wanted to fix.
yup
--
balbi