RE: [PATCH] usb: cdns3: Fix: ARM core hang after connect/disconnect operation.

From: Pawel Laszczak
Date: Thu Jan 09 2020 - 01:27:20 EST


Hi Greg,

+ linux-usb@xxxxxxxxxxxxxxx

>
>On Wed, Jan 08, 2020 at 12:37:18PM +0100, Pawel Laszczak wrote:
>> The ARM core hang when access USB register after tens of thousands
>> connect/disconnect operation.
>>
>> The issue was observed on platform with android system and is not easy
>> to reproduce. During test controller works at HS device mode with host
>> connected.
>>
>> The test is based on continuous disabling/enabling USB device function
>> what cause continuous setting DEVDS/DEVEN bit in USB_CONF register.
>>
>> For testing was used composite device consisting from ADP and RNDIS
>> function.
>>
>> Presumably the problem was caused by DMA transfer made after setting
>> DEVDS bit. To resolve this issue fix stops all DMA transfer before
>> setting DEVDS bit.
>>
>> Signed-off-by: Pawel Laszczak <pawell@xxxxxxxxxxx>
>> Signed-off-by: Peter Chan <peter.chan@xxxxxxx>
>> Reported-by: Peter Chan <peter.chan@xxxxxxx>
>> Fixes: 7733f6c32e36 ("usb: cdns3: Add Cadence USB3 DRD Driver")
>
>As this is in the 5.4 kernel release, you should have a "Cc: stable..."
>line in here, right?

Ok,

>
>> ---
>> drivers/usb/cdns3/gadget.c | 84 ++++++++++++++++++++++++++------------
>> drivers/usb/cdns3/gadget.h | 1 +
>> 2 files changed, 58 insertions(+), 27 deletions(-)
>>
>> diff --git a/drivers/usb/cdns3/gadget.c b/drivers/usb/cdns3/gadget.c
>> index 4c1e75509303..277ed8484032 100644
>> --- a/drivers/usb/cdns3/gadget.c
>> +++ b/drivers/usb/cdns3/gadget.c
>> @@ -1516,6 +1516,49 @@ static int cdns3_ep_onchip_buffer_reserve(struct cdns3_device *priv_dev,
>> return 0;
>> }
>>
>> +static int cdns3_disable_reset_ep(struct cdns3_device *priv_dev,
>> + struct cdns3_endpoint *priv_ep)
>> +{
>> + unsigned long flags;
>> + u32 val;
>> + int ret;
>> +
>> + spin_lock_irqsave(&priv_dev->lock, flags);
>> +
>> + if (priv_ep->flags & EP_HW_RESETED) {
>
>Is "reseted" a word? :)
>
>> + spin_unlock_irqrestore(&priv_dev->lock, flags);
>> + return 0;
>> + }
>> +
>> + cdns3_select_ep(priv_dev, priv_ep->endpoint.desc->bEndpointAddress);
>> +
>> + val = readl(&priv_dev->regs->ep_cfg);
>> + val &= ~EP_CFG_ENABLE;
>> + writel(val, &priv_dev->regs->ep_cfg);
>> +
>> + /**
>
>No need for kernel-doc comment style please.
>
>> + * Driver needs some time before resetting endpoint.
>> + * It need waits for clearing DBUSY bit or for timeout expired.
>> + * 10us is enough time for controller to stop transfer.
>> + */
>> + readl_poll_timeout_atomic(&priv_dev->regs->ep_sts, val,
>> + !(val & EP_STS_DBUSY), 1, 10);
>
>You don't care if you time out?
I don't need care of it. The next timeout is critical.
>
>> + writel(EP_CMD_EPRST, &priv_dev->regs->ep_cmd);
>> +
>> + ret = readl_poll_timeout_atomic(&priv_dev->regs->ep_cmd, val,
>> + !(val & (EP_CMD_CSTALL | EP_CMD_EPRST)),
>> + 1, 1000);
>> +
>> + if (unlikely(ret))
>
>Unless you can measure the difference of using/not using a
>unlikely/likely mark, NEVER use it. The compiler and cpu can almost
>always do better than you can, we have the tests to prove it.
>

The both of the above timeout should never occur. If they occurred it would be a
critical controller bug. In this case driver can only inform about this event.
For timeouts used in driver I've never see an errors. Because debugging these
kind of errors is very hard I decided to leave dev_err in such case to inform that
probably something is wrong in HW project.

I will remove unlikely.

>> + dev_err(priv_dev->dev, "Timeout: %s resetting failed.\n",
>> + priv_ep->name);
>> +
>> + priv_ep->flags |= EP_HW_RESETED;
>
>So an error happens, but you still claim the device is reset? What can
>a user do about this error?

The error should never occur.

>
>Yes, I know you copied this from other code in this driver, but it
>doesn't look right to me.

>
>> + spin_unlock_irqrestore(&priv_dev->lock, flags);
>
>Why print while a spinlock is held? That's mean, you could have printed
>here, right?
>

Yes, it's true.

>> +
>> + return ret;
>> +}
>> +
>> void cdns3_configure_dmult(struct cdns3_device *priv_dev,
>> struct cdns3_endpoint *priv_ep)
>> {
>> @@ -1893,8 +1936,6 @@ static int cdns3_gadget_ep_disable(struct usb_ep *ep)
>> struct usb_request *request;
>> unsigned long flags;
>> int ret = 0;
>> - u32 ep_cfg;
>> - int val;
>>
>> if (!ep) {
>> pr_err("usbss: invalid parameters\n");
>> @@ -1908,32 +1949,11 @@ static int cdns3_gadget_ep_disable(struct usb_ep *ep)
>> "%s is already disabled\n", priv_ep->name))
>> return 0;
>>
>> - spin_lock_irqsave(&priv_dev->lock, flags);
>> -
>> trace_cdns3_gadget_ep_disable(priv_ep);
>>
>> - cdns3_select_ep(priv_dev, ep->desc->bEndpointAddress);
>> -
>> - ep_cfg = readl(&priv_dev->regs->ep_cfg);
>> - ep_cfg &= ~EP_CFG_ENABLE;
>> - writel(ep_cfg, &priv_dev->regs->ep_cfg);
>> -
>> - /**
>> - * Driver needs some time before resetting endpoint.
>> - * It need waits for clearing DBUSY bit or for timeout expired.
>> - * 10us is enough time for controller to stop transfer.
>> - */
>> - readl_poll_timeout_atomic(&priv_dev->regs->ep_sts, val,
>> - !(val & EP_STS_DBUSY), 1, 10);
>> - writel(EP_CMD_EPRST, &priv_dev->regs->ep_cmd);
>> -
>> - readl_poll_timeout_atomic(&priv_dev->regs->ep_cmd, val,
>> - !(val & (EP_CMD_CSTALL | EP_CMD_EPRST)),
>> - 1, 1000);
>> - if (unlikely(ret))
>> - dev_err(priv_dev->dev, "Timeout: %s resetting failed.\n",
>> - priv_ep->name);
>> + cdns3_disable_reset_ep(priv_dev, priv_ep);
>>
>> + spin_lock_irqsave(&priv_dev->lock, flags);
>> while (!list_empty(&priv_ep->pending_req_list)) {
>> request = cdns3_next_request(&priv_ep->pending_req_list);
>>
>> @@ -1962,6 +1982,7 @@ static int cdns3_gadget_ep_disable(struct usb_ep *ep)
>>
>> ep->desc = NULL;
>> priv_ep->flags &= ~EP_ENABLED;
>> + priv_ep->flags |= EP_CLAIMED | EP_HW_RESETED;
>
>Why do you now set EP_CLAIMED here when you never set that before? Is
>this a different type of change?

My mistake. Thanks for letting me know. I merged two different patches fixing the same issue and
I missed it.

>
>thanks,
>
>greg k-h

Thanks,
Pawell