Re: usb: host: Reduce xhci_handshake timeout in xhci_reset
From: Jung Daehwan
Date: Sun Jun 27 2021 - 22:40:16 EST
On Tue, Jun 22, 2021 at 09:56:20PM +0200, Greg Kroah-Hartman wrote:
> On Tue, Jun 22, 2021 at 08:24:56PM +0900, Daehwan Jung wrote:
> > It seems 10 secs timeout is too long in general case. A core would wait for
> > 10 secs without doing other task and it can be happended on every device.
>
> Only if the handshake does not come back sooner, right?
Yes, right.
> What is causing your device to timeout here?
Host Controller doesn't respond handshake. I don't know why and I ask HW team
to debug it.
> > It's better to reduce timeout for general case and use new quirk if needed.
>
> What new quirk?
I mean someone can add new quirk if one still needs long timeout. I guess 1 sec
seems enough but there're many kinds of devices.
> And why 1 second, where did that number come from?
It was 250 msecs before changed to 10 secs. There's no required minimum time in
xhci specification. 1 second is estimated number and it works well on my device.
> >
> > Signed-off-by: Daehwan Jung <dh10.jung@xxxxxxxxxxx>
> > ---
> > drivers/usb/host/xhci.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
> > index 9248ce8..0a1b6be 100644
> > --- a/drivers/usb/host/xhci.c
> > +++ b/drivers/usb/host/xhci.c
> > @@ -196,7 +196,7 @@ int xhci_reset(struct xhci_hcd *xhci)
> > udelay(1000);
> >
> > ret = xhci_handshake(&xhci->op_regs->command,
> > - CMD_RESET, 0, 10 * 1000 * 1000);
> > + CMD_RESET, 0, 1 * 1000 * 1000);
> > if (ret)
> > return ret;
> >
> > @@ -210,7 +210,7 @@ int xhci_reset(struct xhci_hcd *xhci)
> > * than status until the "Controller Not Ready" flag is cleared.
> > */
> > ret = xhci_handshake(&xhci->op_regs->status,
> > - STS_CNR, 0, 10 * 1000 * 1000);
> > + STS_CNR, 0, 1 * 1000 * 1000);
>
> With this change, what "goes faster"? What is currently causing
> problems with your host controller that this timeout value actually
> matters? Why is it failing?
I guess the root cause of it is from host controller, which it is HW.
Our HW engineer has been debugging it, but I haven't get any clue till now.
However, I think 10 secs timeout is too long and it can cause system problem.
That's why I want to change timeout value. A CPU core would not do anything but
waiting xhci reset for 10 secs with disabling irq like below.
usb_remove_hcd -> xhci_stop -> xhci_reset -> xhci_handshake
static void xhci_stop(struct usb_hcd *hcd)
{
u32 temp;
struct xhci_hcd *xhci = hcd_to_xhci(hcd);
mutex_lock(&xhci->mutex);
/* Only halt host and free memory after both hcds are removed */
if (!usb_hcd_is_primary_hcd(hcd)) {
mutex_unlock(&xhci->mutex);
return;
}
xhci_dbc_exit(xhci);
spin_lock_irq(&xhci->lock); -> disable IRQ
xhci->xhc_state |= XHCI_STATE_HALTED;
xhci->cmd_ring_state = CMD_RING_STATE_STOPPED;
xhci_halt(xhci);
xhci_reset(xhci); -> 10 seconds timeout!
spin_unlock_irq(&xhci->lock);
Best Regards,
Jung Daehwan
> thanks,
>
> greg k-h
>