Re: [PATCH] usb: dwc3: core: Disable USB2.0 phy suspend when dwc3 acts as host role

From: John Youn
Date: Wed Jan 18 2017 - 22:31:42 EST


On 1/18/2017 7:12 PM, Baolin Wang wrote:
> Hi John,
>
> On 19 January 2017 at 09:33, John Youn <John.Youn@xxxxxxxxxxxx> wrote:
>> On 1/16/2017 2:38 AM, Felipe Balbi wrote:
>>>
>>> Hi,
>>>
>>> John Youn <John.Youn@xxxxxxxxxxxx> writes:
>>>>> Baolin Wang <baolin.wang@xxxxxxxxxx> writes:
>>>>>>> Baolin Wang <baolin.wang@xxxxxxxxxx> writes:
>>>>>>>> When dwc3 controller acts as host role with attaching slow speed device
>>>>>>>> (like mouse or keypad). Then if we plugged out the slow speed device,
>>>>>>>> it will timeout to run the deconfiguration endpoint command to drop the
>>>>>>>> endpoint's resources. Some xHCI command timeout log as below when
>>>>>>>> disconnecting one slow device:
>>>>>>>>
>>>>>>>> [ 99.807739] c0 xhci-hcd.0.auto: Port Status Change Event for port 1
>>>>>>>> [ 99.814699] c0 xhci-hcd.0.auto: resume root hub
>>>>>>>> [ 99.819992] c0 xhci-hcd.0.auto: handle_port_status: starting port
>>>>>>>> polling.
>>>>>>>> [ 99.827808] c0 xhci-hcd.0.auto: get port status, actual port 0 status
>>>>>>>> = 0x202a0
>>>>>>>> [ 99.835903] c0 xhci-hcd.0.auto: Get port status returned 0x10100
>>>>>>>> [ 99.850052] c0 xhci-hcd.0.auto: clear port connect change, actual
>>>>>>>> port 0 status = 0x2a0
>>>>>>>> [ 99.859313] c0 xhci-hcd.0.auto: Cancel URB ffffffc01ed6cd00, dev 1,
>>>>>>>> ep 0x81, starting at offset 0xc406d210
>>>>>>>> [ 99.869645] c0 xhci-hcd.0.auto: // Ding dong!
>>>>>>>> [ 99.874776] c0 xhci-hcd.0.auto: Stopped on Transfer TRB
>>>>>>>> [ 99.880713] c0 xhci-hcd.0.auto: Removing canceled TD starting at
>>>>>>>> 0xc406d210 (dma).
>>>>>>>> [ 99.889012] c0 xhci-hcd.0.auto: Finding endpoint context
>>>>>>>> [ 99.895069] c0 xhci-hcd.0.auto: Cycle state = 0x1
>>>>>>>> [ 99.900519] c0 xhci-hcd.0.auto: New dequeue segment =
>>>>>>>> ffffffc1112f0880 (virtual)
>>>>>>>> [ 99.908655] c0 xhci-hcd.0.auto: New dequeue pointer = 0xc406d220 (DMA)
>>>>>>>> [ 99.915927] c0 xhci-hcd.0.auto: Set TR Deq Ptr cmd, new deq seg =
>>>>>>>> ffffffc1112f0880 (0xc406d000 dma),
>>>>>>>> new deq ptr = ffffff8002175220
>>>>>>>> (0xc406d220 dma), new cycle = 1
>>>>>>>> [ 99.931242] c0 xhci-hcd.0.auto: // Ding dong!
>>>>>>>> [ 99.936360] c0 xhci-hcd.0.auto: Successful Set TR Deq Ptr cmd,
>>>>>>>> deq = @c406d220
>>>>>>>> [ 99.944458] c0 xhci-hcd.0.auto: xhci_hub_status_data: stopping port
>>>>>>>> polling.
>>>>>>>> [ 100.047619] c0 xhci-hcd.0.auto: xhci_drop_endpoint called for udev
>>>>>>>> ffffffc01ae08800
>>>>>>>> [ 100.057002] c0 xhci-hcd.0.auto: drop ep 0x81, slot id 1, new drop
>>>>>>>> flags = 0x8, new add flags = 0x0
>>>>>>>> [ 100.067878] c0 xhci-hcd.0.auto: xhci_check_bandwidth called for udev
>>>>>>>> ffffffc01ae08800
>>>>>>>> [ 100.076868] c0 xhci-hcd.0.auto: New Input Control Context:
>>>>>>>>
>>>>>>>> ......
>>>>>>>>
>>>>>>>> [ 100.427252] c0 xhci-hcd.0.auto: // Ding dong!
>>>>>>>> [ 105.430728] c0 xhci-hcd.0.auto: Command timeout
>>>>>>>> [ 105.436029] c0 xhci-hcd.0.auto: Abort command ring
>>>>>>>> [ 113.558223] c0 xhci-hcd.0.auto: Command completion event does not match
>>>>>>>> command
>>>>>>>> [ 113.569778] c0 xhci-hcd.0.auto: Timeout while waiting for configure
>>>>>>>> endpoint command
>>>>>>>>
>>>>>>>> The reason is it will suspend USB phy to disable phy clock when
>>>>>>>> disconnecting the slow USB decice, that will hang on the xHCI commands
>>>>>>>> executing which depends on the phy clock.
>>>>>>>>
>>>>>>>> Thus we should disable USB2.0 phy suspend feature when dwc3 acts as host
>>>>>>>> role.
>>>>>>>>
>>>>>>>> Signed-off-by: Baolin Wang <baolin.wang@xxxxxxxxxx>
>>>>>>>> ---
>>>>>>>> drivers/usb/dwc3/core.c | 14 ++++++++++++++
>>>>>>>> 1 file changed, 14 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
>>>>>>>> index 9a4a5e4..0b646cf 100644
>>>>>>>> --- a/drivers/usb/dwc3/core.c
>>>>>>>> +++ b/drivers/usb/dwc3/core.c
>>>>>>>> @@ -565,6 +565,20 @@ static int dwc3_phy_setup(struct dwc3 *dwc)
>>>>>>>> if (dwc->revision > DWC3_REVISION_194A)
>>>>>>>> reg |= DWC3_GUSB2PHYCFG_SUSPHY;
>>>>>>>>
>>>>>>>> + /*
>>>>>>>> + * When dwc3 controller acts as host role with attaching one slow speed
>>>>>>>> + * device (like mouse or keypad). Then if we plugged out the slow speed
>>>>>>>> + * device, it will timeout to run the deconfiguration endpoint command.
>>>>>>>> + * The reason is it will suspend USB phy to disable phy clock when
>>>>>>>> + * disconnecting slow speed decice, which will affect the xHCI commands
>>>>>>>> + * executing.
>>>>>>>> + *
>>>>>>>> + * Thus we should disable USB 2.0 phy suspend feature when dwc3 acts as
>>>>>>>> + * host role.
>>>>>>>> + */
>>>>>>>> + if (dwc->dr_mode == USB_DR_MODE_HOST || dwc->dr_mode == USB_DR_MODE_OTG)
>>>>>>>> + reg &= ~DWC3_GUSB2PHYCFG_SUSPHY;
>>>>>>>
>>>>>>> which version of the core you're using? Recent version (since 1.94A,
>>>>>>
>>>>>> My version is 2.80a.
>>>>>>
>>>>>>> IIRC) can manage core suspend automatically. Also, this patch of yours
>>>>>>> will cause a power consumption regression.
>>>>>>
>>>>>> Yes, it can manage core suspend automatically, that is the problem.
>>>>>> When plugging out one mouse or keypad device, the phy will suspend
>>>>>> automatically to disable the phy clock. But now the disconnecting
>>>>>> process is not finished, and some xHCI commands (like deconfiguration
>>>>>> endpoint command to drop endpoint resources) need depend on the phy
>>>>>> clock, which will hang on the system to timeout the command or abort
>>>>>> command ring to halt the xHCI.
>>>>>>
>>>>>> I agree with you it will cause a power consumption regression, but it
>>>>>> will cause serious problem if not. Do you have some suggestion?
>>>>>
>>>>> sorry for the long delay. This was lost in my inbox.
>>>>>
>>>>> I'm not sure this patch is the best solution. There's no mention in
>>>>> Databook that we should avoid PHY suspend when acting as host. Adding
>>>>> John here to see if John has any idea of how to fix this.
>>>>>
>>>>
>>>> I'm not familiar enough with XHCI side of things to say.
>>>>
>>>> I'll ask around to see if anyone has an idea.
>>>
>>
>> Hi Felipe, Baolin,
>>
>> I talked with a couple engineers here and the behavior is not
>> something that's expected in host mode.
>>
>> Can you check that the value of the GCTL.RAMCLKSEL is set
>> appropriately? This affects where the core gets the clock signal
>
> In host mode, the bit[6:7] for RAMCLKSEL is default value 0, which
> means it selects bus clock.
>
>> from. If it is getting it from the phy clock then you will likely have
>> this problem and will need to adjust it. Otherwise you should probably
>> use the existing quirk instead.
>
> So the bus clock is from the phy clock, then it will have this problem
> when suspending phy? Yes, I can use the existing quirk, but I am
> afraid it is one common problem if we use the mainline kernel. Or we
> can add some documentation for enabling the phy suspend feature to
> remind other people.
>

Hi Baolin,

It's expected the clocks to the PHY and core are different if you
suspend only the PHY. That's not particular to just the host, for
example handling LPM in device mode you could have the same problem.

You would have to confirm with your own platform how it is.

Regards,
John