Re: [PATCH v2 3/4] usb: dwc3: add dual-role support

From: Felipe Balbi
Date: Thu Mar 30 2017 - 05:27:19 EST



Hi

Roger Quadros <rogerq@xxxxxx> writes:
>>>> For something that simple, we wouldn't even need to use OTG FSM layer
>>>> because that brings no benefit for such a simple requirement.
>>>
>>> no no. I think you got it wrong. I'm not using the OTG FSM layer at all :).
>>
>> what are all the otg_fsm mentions then? Also you have:
>>
>> + struct usb_otg otg;
>> + struct otg_fsm otg_fsm;
>>
>
> I'll get rid of them. They aren't really needed.
> Now I see why you thought I was using the otg_fsm layer. :)

fair enough

>>>> Can you either confirm or refute the statement above?
>>>
>>> The real problem was that if host adapter was removed during a system suspend
>>> then while resuming XHCI was locking up like below. This is probably because
>>> we're trying to remove/Halt the HCD in the otg_irq_handler before XHCI driver has resumed?
>>>
>>> How can we ensure that we call dwc3_host_exit() only *after* XHCI driver has resumed?
>>
>> Well, xHCI is our child, so driver model should *already* be
>> guaranteeing that, no? Specially since you're calling this from
>> ->complete() which happens after ->resume() has been called for the
>> entire tree. It's true, however, that dwc3's ->complete() will be called
>> before xhci's ->complete(). Is this the problem you're pointing at? Or
>> do you mean xHCI is runtime suspended (or runtime resuming) while you
>> call dwc3_host_exit()? If that's the case, then there's a bug in xHCI
>> itself.
>
> Yes dwc3->complete() being called before xhci->complete(), and so HCD being
> removed before xhci->complete() causes the lockup.
>
> It could be a bug in xHCI driver as well.

I see...

>>> We need a way to mask the OTG events without loosing them while they are masked.
>>> Do you know how that could be achieved?
>>
>> masking doesn't clear events. It just masks them. Look at gadget.c for
>> how we do it. Basically, the code we have here is racy, really racy and
>> will cause hard-to-debug lockups. Your code can only work with
>> IRQF_ONESHOT, which we don't want to add back.
>>
>> In any case, to mask events, you set BIT 31 in GEVNTSIZ register. Just
>> copy it from gadget.c ;-)
>
> Isn't GEVNTSIZ specific to device side? We're talking about the OTG block here.

that's true, sorry.

> Are you sure that setting bit 31 of GEVNTSIZ will mask OTG_irq? I don't think so.
>
> Note that OTG_IRQ is a separate IRQ line than PERIPHERAL_IRQ.

man, there's really no way to mask OTG events. This can be bad :-)

John, can you confirm if there's really no way of masking OTG events
without loosing them?

>>>>> + /* OEVTEN = 0 */
>>>>> + dwc3_otg_disable_events(dwc, DWC3_OTG_ALL_EVENTS);
>>>>> + /* OEVTEN.ConIDStsChngEn = 1. Instead we enable all events */
>>>>> + dwc3_otg_enable_events(dwc, DWC3_OTG_ALL_EVENTS);
>>>>
>>>> oh, disable everything and enable everything right after. What gives?
>>>
>>> I did this following Fig 11.4. But there there don't enable all events,
>>> so it was a good idea to be on a clean slate by disabling all events first
>>> and then only enabling selected events.
>>>
>>> In any case I think it is good practice. i.e. clear before OR operation?
>>> FYI. dwc3_otg_enable_events doesn't clear the events that are not enabled so
>>> if some old event not part of enable_mask was enabled it will stay enabled.
>>
>> can't this result in IRQ triggering forever and us not handling it? ;-)
>
> Why should enabling events trigger IRQ? IRQ will trigger only when the
> event actually happens no?

heh, right :-) What I mean is that you might enable an interrupt event
which you don't clear, because you don't support it or don't handle it,
or whatever.

Reserved bits might become non-reserved in the future and so on.

>>>>> + /* start the xHCI host driver */
>>>>> + if (!skip) {
>>>>
>>>> when would skip be true?
>>>>
>>>
>>> only during system resume.
>>
>> hmmm, is there a reason for that? I mean, could we live without it for
>> the time being? Seems like all this achieves is avoiding reenumeration
>> of some devices during resume. Do we care from a starting
>> implementation?
>
> At least on AM43x, it was required. without that USB devices plugged in
> before a system suspend were lost after resume.
>
> I agree on dropping this for now and adding it later.

looks like we have another problem which needs to be investigated ;-)

>>>>> + dwc3_writel(dwc->regs, DWC3_GCTL, reg);
>>>>> +
>>>>> + /* start the Peripheral driver */
>>>>> + if (dwc->gadget_driver) {
>>>>> + __dwc3_gadget_start(dwc);
>>>>> + if (dwc->gadget_pullup)
>>>>> + dwc3_gadget_run_stop(dwc, true, false);
>>>>
>>>> why don't you add/remove the UDC just like you do for the HCD? (you
>>>> wouldn't add/remove a device, but rather call
>>>> usb_del_gadget_udc()/usb_add_gadget_udc() directly. Would that clean up
>>>> some of this?
>>>
>>> It causes more problems than solving anything.
>>> e.g. An already loaded gadget driver will have to be manually removed and re-loaded to
>>> work after a peripheral to host to peripheral mode switch.
>>
>> is that really still true? When we remove the UDC, the currently-loaded
>> gadget driver will be moved to the pending list. Once a UDC is added
>> back, udc-core will bind it again to the UDC.
>>
>
> OK. I need to test this again. As you say, the issue might already have been fixed.
> Good to know that.

okay

>>>>> + irq_set_status_flags(dwc->otg_irq, IRQ_NOAUTOEN);
>>>>
>>>> why?
>>>
>>> I don't know how to fix this. I have to do this because dwc3_omap is doing it
>>> on the shared IRQ line and the flags must match if we need to share it.
>>
>> hmmm... Then why does dwc_omap IRQ have IRQ_NOAUTOEN and otg_irq doesn't?
>
> We're setting IRQ_NOAUTOEN for otg_irq above.
> But the problem is that other platforms might not have this set so it will break there.

exactly :-)

> Need to think of a better way how to tackle this.

okay

--
balbi