Re: [PATCH v3] usb: dwc3: Stop active transfers before halting the controller

From: Felipe Balbi
Date: Thu Sep 24 2020 - 03:39:36 EST



Hi,

Wesley Cheng <wcheng@xxxxxxxxxxxxxx> writes:
> On 9/6/2020 11:20 PM, Felipe Balbi wrote:
>>
>> Hi,
>>
>> Wesley Cheng <wcheng@xxxxxxxxxxxxxx> writes:
>>> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
>>> index 59f2e8c31bd1..456aa87e8778 100644
>>> --- a/drivers/usb/dwc3/ep0.c
>>> +++ b/drivers/usb/dwc3/ep0.c
>>> @@ -197,7 +197,7 @@ int dwc3_gadget_ep0_queue(struct usb_ep *ep, struct usb_request *request,
>>> int ret;
>>>
>>> spin_lock_irqsave(&dwc->lock, flags);
>>> - if (!dep->endpoint.desc) {
>>> + if (!dep->endpoint.desc || !dwc->pullups_connected) {
>>
>> this looks odd. If we don't have pullups connected, we shouldn't have a
>> descriptor, likewise if we don't have a a description, we haven't been
>> enumerated, therefore we shouldn't have pullups connected.
>>
>> What am I missing here?
>>
>
> Hi Felipe,
>
> When we
> echo "" > /sys/kernel/config/usb_gadget/g1/UDC
>
> This triggers the usb_gadget_disconnect() routine to execute.
>
> int usb_gadget_disconnect(struct usb_gadget *gadget)
> {
> ...
> ret = gadget->ops->pullup(gadget, 0);
> if (!ret) {
> gadget->connected = 0;
> gadget->udc->driver->disconnect(gadget);
> }
>
> So it is possible that we've already disabled the pullup before running
> the disable() callbacks in the function drivers. The disable()

we used to have usage counts for those, are they gone? I think they're
still there.

> callbacks usually are the ones responsible for calling usb_ep_disable(),
> where we clear the desc field. This means there is a brief period where
> the pullups_connected = 0, but we still have valid ep desc, as it has
> not been disabled yet.

this is a valid point, though

> Also, for function drivers like mass storage, the fsg_disable() routine
> defers the actual usb_ep_disable() call to the fsg_thread, so its not
> always ensured that the disconnect() execution would result in the
> usb_ep_disable() to occur synchronously.

also a good point.

>>> @@ -1926,6 +1926,21 @@ static int dwc3_gadget_set_selfpowered(struct usb_gadget *g,
>>> return 0;
>>> }
>>>
>>> +static void dwc3_stop_active_transfers(struct dwc3 *dwc)
>>> +{
>>> + u32 epnum;
>>> +
>>> + for (epnum = 2; epnum < DWC3_ENDPOINTS_NUM; epnum++) {
>>
>> dwc3 knows the number of endpoints available in the HW. Use dwc->num_eps
>> instead.
>>
>
> Sure, will do.
>
>>> @@ -1971,6 +1986,8 @@ static int dwc3_gadget_run_stop(struct dwc3 *dwc, int is_on, int suspend)
>>> return 0;
>>> }
>>>
>>> +static void __dwc3_gadget_stop(struct dwc3 *dwc);
>>> +
>>> static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
>>> {
>>> struct dwc3 *dwc = gadget_to_dwc(g);
>>> @@ -1994,9 +2011,37 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
>>> }
>>> }
>>>
>>> + /*
>>> + * Synchronize and disable any further event handling while controller
>>> + * is being enabled/disabled.
>>> + */
>>> + disable_irq(dwc->irq_gadget);
>>
>> why isn't dwc3_gadget_disable_irq() enough?
>>
>>> spin_lock_irqsave(&dwc->lock, flags);
>>
>> spin_lock_irqsave() will disable interrupts, why disable_irq() above?
>>
>
> In the discussion I had with Thinh, the concern was that with the newly
> added code to override the lpos here, if the interrupt routine
> (dwc3_check_event_buf()) runs, then it will reference the lpos for

that's running in hardirq context. All interrupts are disabled while
that runs, there's no risk of race, right?

> copying the event buffer contents to the event cache, and potentially
> process events. There is no locking in place, so it could be possible
> to have both run in parallel.

Is this academic or have you actually found a situation where this
could, indeed, happen? The spin_lock_irqsave() should be enough to
synchronize dwc3_gadget_pullup() and the interrupt handler.

> Hence, the reason if there was already a pending IRQ triggered, the
> dwc3_gadget_disable_irq() won't ensure the IRQ is handled. We can do
> something like:
> if (!is_on)
> dwc3_gadget_disable_irq()
> synchronize_irq()
> spin_lock_irqsave()
> if(!is_on) {
> ...
>
> But the logic to only apply this on the pullup removal case is a little
> messy. Also, from my understanding, the spin_lock_irqsave() will only
> disable the local CPU IRQs, but not the interrupt line on the GIC, which
> means other CPUs can handle it, unless we explicitly set the IRQ
> affinity to CPUX.

Yeah, the way I understand this can't really happen. But I'm open to
being educated. Maybe Alan can explain if this is really possibility?

>>> + dwc3_stop_active_transfers(dwc);
>>> + __dwc3_gadget_stop(dwc);
>>> +
>>> + count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
>>> + count &= DWC3_GEVNTCOUNT_MASK;
>>> + if (count > 0) {
>>> + dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), count);
>>> + dwc->ev_buf->lpos = (dwc->ev_buf->lpos + count) %
>>> + dwc->ev_buf->length;
>>> + }
>>
>> don't duplicate code. Add a patch before this extracting this into
>> helper and use it for both irq handler and gadget pullup.
>>
>
> We actually removed this call in the IRQ handler, as if we ensure that
> the IRQ routine has fully complete and won't trigger anymore, then this
> sequence will handle clearing of the event count.

oh, makes sense :-)

--
balbi

Attachment: signature.asc
Description: PGP signature