Re: [PATCH] usb: dwc3: Prevent indefinite sleep in _dwc3_set_mode during suspend/resume

From: Baolin Wang
Date: Mon Mar 05 2018 - 06:25:21 EST


On 5 March 2018 at 19:14, Roger Quadros <rogerq@xxxxxx> wrote:
> On 05/03/18 13:06, Felipe Balbi wrote:
>>
>> Hi,
>>
>> Baolin Wang <baolin.wang@xxxxxxxxxx> writes:
>>>>> Roger Quadros <rogerq@xxxxxx> writes:
>>>>>>> Roger Quadros <rogerq@xxxxxx> writes:
>>>>>>>> In the following test we get stuck by sleeping forever in _dwc3_set_mode()
>>>>>>>> after which dual-role switching doesn't work.
>>>>>>>>
>>>>>>>> On dra7-evm's dual-role port,
>>>>>>>> - Load g_zero gadget driver and enumerate to host
>>>>>>>> - suspend to mem
>>>>>>>> - disconnect USB cable to host and connect otg cable with Pen drive in it.
>>>>>>>> - resume system
>>>>>>>> - we sleep indefinitely in _dwc3_set_mode due to.
>>>>>>>> dwc3_gadget_exit()->usb_del_gadget_udc()->udc_stop()->
>>>>>>>> dwc3_gadget_stop()->wait_event_lock_irq()
>>>>>>>>
>>>>>>>> Let's clear the DWC3_EP_END_TRANSFER_PENDING flag on all endpoints
>>>>>>>> so we don't wait in dwc3_gadget_stop().
>>>>>>>>
>>>>>>>> Signed-off-by: Roger Quadros <rogerq@xxxxxx>
>>>>>>>> ---
>>>>>>>> drivers/usb/dwc3/gadget.c | 14 ++++++++++++++
>>>>>>>> 1 file changed, 14 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>>>>>> index 2bda4eb..0a360da 100644
>>>>>>>> --- a/drivers/usb/dwc3/gadget.c
>>>>>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>>>>>> @@ -3273,6 +3273,20 @@ int dwc3_gadget_init(struct dwc3 *dwc)
>>>>>>>>
>>>>>>>> void dwc3_gadget_exit(struct dwc3 *dwc)
>>>>>>>> {
>>>>>>>> + int epnum;
>>>>>>>> + unsigned long flags;
>>>>>>>> +
>>>>>>>> + spin_lock_irqsave(&dwc->lock, flags);
>>>>>>>> + for (epnum = 2; epnum < DWC3_ENDPOINTS_NUM; epnum++) {
>>>>>>>> + struct dwc3_ep *dep = dwc->eps[epnum];
>>>>>>>> +
>>>>>>>> + if (!dep)
>>>>>>>> + continue;
>>>>>>>> +
>>>>>>>> + dep->flags &= ~DWC3_EP_END_TRANSFER_PENDING;
>>>>>>>> + }
>>>>>>>> + spin_unlock_irqrestore(&dwc->lock, flags);
>>>>>>>> +
>>>>>>>> usb_del_gadget_udc(&dwc->gadget);
>>>>>>>> dwc3_gadget_free_endpoints(dwc);
>>>>>>>
>>>>>>> free endpoints is a better place for this. It's already going to free
>>>>>>> the memory anyway. Might as well clear all flags to 0 there.
>>>>>>>
>>>>>>
>>>>>> But it won't solve the deadlock issue. Since dwc3_gadget_free_endpoints()
>>>>>> is called after usb_del_gadget_udc() and the deadlock happens when
>>>>>>
>>>>>> usb_del_gadget_udc()->udc_stop()->dwc3_gadget_stop()->wait_event_lock_irq()
>>>>>>
>>>>>> and DWC3_EP_END_TRANSFER_PENDING flag is set.
>>>>>
>>>>> indeed. Iterating twice over the entire endpoint list seems
>>>>> wasteful. Perhaps we just shouldn't wait when removing the UDC since
>>>>> that's essentially what this patch will do, right? If you clear the flag
>>>>> before calling ->udc_stop(), this means the loop in dwc3_gadget_stop()
>>>>> will do nothing. Might as well remove it.
>>>>>
>>>>
>>>> This means that we will never wait for DWC3_EP_END_TRANSFER_PENDING to clear
>>>> in dwc3_gadget_stop() like we used to. This is perfectly fine, right?
>>>>
>>>> It makes sense to me as dwc3_gadget_stop() calls __dwc3_gadget_stop() which
>>>> masks all interrupts and nobody will ever clear that flag if it was set.
>>>
>>> I don't think so. It can not mask the endpoint events, please check
>>> the events which will be masked in DEVTEN register. The reason why we
>>> should wait for DWC3_EP_END_TRANSFER_PENDING to clear is that,
>>> sometimes the DWC3_DEPEVT_EPCMDCMPLT event will be triggered later
>>> than 100us, but now we may have freed the gadget irq which will cause
>>> crash.
>>
>> We could mask command complete events as soon as ->udc_stop() is called,
>> right? Hmm, actually, __dwc3_gadget_stop() already clears DEVTEN
>> completely.
>
> But which bit in DEVTEN says Endpoint events are disabled?

When we set up the DWC3_DEPCMD_ENDTRANSFER command in
dwc3_stop_active_transfer(), we can do not set DWC3_DEPCMD_CMDIOC,
then there will no endpoint command complete interrupts I think.

cmd |= DWC3_DEPCMD_CMDIOC;

>
>>
>> /me goes check databook
>>
>> At least on revision 2.60a of the databook, bit 10 is reserved. I wonder
>> if that's the start of all the problems. Anybody has access to older and
>> newer databook revisions so we can cross-check?
>>
>
> I can access v2.40 and v3.10 books.
>
> bit 10 is reserved on both
>
> Differences in v2.4 vs v3.10 are:
>
> bit 8 reserved vs L1SUSPEN
> bit 13 reserved vs StopOnDisconnectEn
> bit 14 reserved vs L1WKUPEVTEN
>
> --
> cheers,
> -roger
>
> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki



--
Baolin.wang
Best Regards