Re: [RFC][PATCH] usb: dwc3: usb: dwc3: Force stop EP0 transfers during pullup disable

From: Thinh Nguyen
Date: Sun Aug 15 2021 - 20:35:07 EST

Felipe Balbi wrote:
> Hi,
> Thinh Nguyen <Thinh.Nguyen@xxxxxxxxxxxx> writes:
>>>>>>>>>> If this occurs, then the entire pullup disable routine is skipped and
>>>>>>>>>> proper cleanup and halting of the controller does not complete.
>>>>>>>>>> Instead of returning an error (which is ignored from the UDC
>>>>>>>>>> perspective), do what is mentioned in the comments and force the
>>>>>>>>>> transaction to complete and put the ep0state back to the SETUP phase.
>>>>>>>>>> Signed-off-by: Wesley Cheng <wcheng@xxxxxxxxxxxxxx>
>>>>>>>>>> ---
>>>>>>>>>> drivers/usb/dwc3/ep0.c | 4 ++--
>>>>>>>>>> drivers/usb/dwc3/gadget.c | 6 +++++-
>>>>>>>>>> drivers/usb/dwc3/gadget.h | 3 +++
>>>>>>>>>> 3 files changed, 10 insertions(+), 3 deletions(-)
>>>>>>>>>> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
>>>>>>>>>> index 6587394..abfc42b 100644
>>>>>>>>>> --- a/drivers/usb/dwc3/ep0.c
>>>>>>>>>> +++ b/drivers/usb/dwc3/ep0.c
>>>>>>>>>> @@ -218,7 +218,7 @@ int dwc3_gadget_ep0_queue(struct usb_ep *ep, struct usb_request *request,
>>>>>>>>>> return ret;
>>>>>>>>>> }
>>>>>>>>>> -static void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>>>>>>> +void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
>>>>>>>>>> {
>>>>>>>>>> struct dwc3_ep *dep;
>>>>>>>>>> @@ -1073,7 +1073,7 @@ void dwc3_ep0_send_delayed_status(struct dwc3 *dwc)
>>>>>>>>>> __dwc3_ep0_do_control_status(dwc, dwc->eps[direction]);
>>>>>>>>>> }
>>>>>>>>>> -static void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>>>>>>> +void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
>>>>>>>>>> {
>>>>>>>>>> struct dwc3_gadget_ep_cmd_params params;
>>>>>>>>>> u32 cmd;
>>>>>>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>>>>>>>> index 54c5a08..a0e2e4d 100644
>>>>>>>>>> --- a/drivers/usb/dwc3/gadget.c
>>>>>>>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>>>>>>>> @@ -2437,7 +2437,11 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
>>>>>>>>>> msecs_to_jiffies(DWC3_PULL_UP_TIMEOUT));
>>>>>>>>>> if (ret == 0) {
>>>>>>>>>> dev_err(dwc->dev, "timed out waiting for SETUP phase\n");
>>>>>>>>>> - return -ETIMEDOUT;
>>>>>>>>>> + spin_lock_irqsave(&dwc->lock, flags);
>>>>>>>>>> + dwc3_ep0_end_control_data(dwc, dwc->eps[0]);
>>>>>>>>>> + dwc3_ep0_end_control_data(dwc, dwc->eps[1]);
>>>>>>>>> End transfer command takes time, need to wait for it to complete before
>>>>>>>>> issuing Start transfer again. Also, why restart again when it's about to
>>>>>>>>> be disconnected.
>>>>>>>> I can try without restarting it again, and see if that works. Instead
>>>>>>>> of waiting for the command complete event, can we set the ForceRM bit,
>>>>>>>> similar to what we do for dwc3_remove_requests()?
>>>>>>> ForceRM=1 means that the controller will ignore updating the TRBs
>>>>>>> (including not clearing the HWO and remain transfer size). The driver
>>>>>>> still needs to wait for the command to complete before issuing Start
>>>>>>> Transfer command. Otherwise Start Transfer won't go through. If we know
>>>>>>> that we're not going to issue Start Transfer any time soon, then we may
>>>>>>> be able to get away with ignoring End Transfer command completion.
>>>>>> I see. Currently, in the place that we do use
>>>>>> dwc3_ep0_end_control_data(), its followed by
>>>>>> dwc3_ep0_stall_and_restart() which would execute start transfer. For
>>>>> That doesn't look right. You can try to see if it can recover from a
>>>>> control write request. Often time we do control read and not write.
>>>>> (i.e. try to End Transfer and immediately Start Transfer on the same
>>>>> direction control endpoint).
>>>> OK, I can try, but just to clarify, I was referring to how it was being
>>>> done in:
>>>> static void dwc3_ep0_xfernotready(struct dwc3 *dwc,
>>>> const struct dwc3_event_depevt *event)
>>>> {
>>>> ...
>>>> if (dwc->ep0_expect_in != event->endpoint_number) {
>>>> struct dwc3_ep *dep = dwc->eps[dwc->ep0_expect_in];
>>>> dev_err(dwc->dev, "unexpected direction for Data Phase\n");
>>>> dwc3_ep0_end_control_data(dwc, dep);
>>>> dwc3_ep0_stall_and_restart(dwc);
>>>> return;
>>>> }
>> Looking at this snippet again, it looks wrong. For control write
>> unexpected direction, if the driver hasn't setup and started the DATA
>> phase yet, then it's fine, but there is a problem if it did.
>> Since dwc3_ep0_end_control_data() doesn't issue End Transfer command to
>> ep0 due to the resource_index check, it doesn't follow the control
> IIRC resource_index is always non-zero, so the command should be

No, resource_index for ep0out is 0, ep0in is 1. You can check from any
of the driver tracepoint log for the return value of Start Transfer
command for the resource index of ep0. There could be a mixed up with
the undocumented return value of Set Endpoint Transfer Resource command
before when this code was written, don't mix up with that.

> triggered. If you have access to a Lecroy USB Trainer, could you script
> this very scenario for verification?

For anyone who wants to work on this, we don't need a LeCroy USB
trainer. If you use xhci host, just modify the xhci-ring.c to queue a
wrong direction DATA phase TRB of a particular control write request
test, and continue with the next control requests.

>> transfer flow model in the programming guide. This may cause
>> dwc3_ep0_stall_and_restart() to overwrite the TRBs for the DATA phase
>> with SETUP stage. Also, if the ep0 is already started, the driver won't
>> issue Start Transfer command again.
>> This issue is unlikely to occur unless we see a misbehave host for
>> control write request. Regardless, we need to fix this. I may need some
> right, it would be a misbehaving host, however databook called it out as
> something that _can_ happen. Moreover, I have vague memories of this
> being one of the test cases in Lecroy's USB Certification Suite.

Yes, it's something that can happen, and dwc3 should be able to handle
it. If you remember which test in particular that tests this, let me
know. I want to check how it was passed.

>> time before I can create a patch and test it. If you or anyone is up to
>> take this on, it'd be highly appreciated.
> Before we go ahead writing a patch for this, I'd really like to see
> traces showing this failure and a minimal reproducer. The reproducer
> would probably have to be a script for Lecroy's USB Trainer.
> Keep in mind this entire ep0 stack used to pass USBCV on every -rc and
> major release (before I lost access to all my USB gear heh).

Are you referring to Ch9 USBCV? I don't recall there's a particular test
for this.

There should be a red flag whenever we see End Transfer command
immediately follows by a Start Transfer command without any waiting for
End Transfer completion. Though, in this case, we don't go through with
the End Transfer for ep0 due to the resource_index check in