RE: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI commmand timeout

From: Rajesh Bhagat
Date: Mon Mar 28 2016 - 02:13:29 EST




> -----Original Message-----
> From: Mathias Nyman [mailto:mathias.nyman@xxxxxxxxxxxxxxx]
> Sent: Wednesday, March 23, 2016 7:52 PM
> To: Rajesh Bhagat <rajesh.bhagat@xxxxxxx>
> Cc: gregkh@xxxxxxxxxxxxxxxxxxx; linux-usb@xxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx; Sriram Dash <sriram.dash@xxxxxxx>
> Subject: Re: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI
> commmand timeout
>
> On 23.03.2016 05:53, Rajesh Bhagat wrote:
>
> >>> IMO, The assumption that "xhci_abort_cmd_ring would always generate
> >>> an event and handle_cmd_completion would be called" will not be
> >>> always be true if HW
> >> is in bad state.
> >>>
> >>> Please share your opinion.
> >>>
> >>
> >> writing the CA (command abort) bit in CRCR (command ring control
> >> register) will stop the command ring, and CRR (command ring running) will be set
> to 0 by xHC.
> >> xhci_abort_cmd_ring() polls this bit up to 5 seconds.
> >> If it's not 0 then the driver considers the command abort as failed.
> >>
> >> The scenario you're thinking of is that xHC would still react to CA
> >> bit set, it would stop the command ring and set CRR 0, but not send a command
> completion event.
> >>
> >> Have you tried adding some debug to handle_cmd_completion() and see
> >> if you receive any event after command abortion?
> >>
> >
> > Yes. We have added debug prints at first line of
> > handle_cmd_completion, and we are not getting those prints. The last
> > print messages that we get are as below from xhci_alloc_dev while
> > resume
> > operation:
> >
> > xhci-hcd xhci-hcd.0.auto: Command timeout xhci-hcd xhci-hcd.0.auto:
> > Abort command ring
> >
> > May be somehow, USB controller is in bad state and not responding to the
> commands.
> >
> > Please suggest how XHCI driver can handle such situations.
> >
>
> Restart the command timeout timer when writing the command abort bit.
> If we get theIf we get the abort event the timer is deleted.
>
> Otherwise if the timout triggers a second time we end up calling
> xhci_handle_command_timeout() with a stopped ring, This will call
> xhci_handle_stopped_cmd_ring(), turn the aborted command to no-op, restart the
> command ring, and finally when the no-op completes it should call the missing
> completion.
>
> If command ring doesn't start then additional code could be added to
> xhci_handle_command_timeout() that clears the command ring if it is called a second
> time (=current command is already in abort state and command ring is stopped when
> entering xhci_handle_command_timeout)
>
> There might be some details missing, I'm not able to test any of this, but try
> something like this:
>
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index
> 3e1d24c..576819e 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -319,7 +319,10 @@ static int xhci_abort_cmd_ring(struct xhci_hcd *xhci)
> xhci_halt(xhci);
> return -ESHUTDOWN;
> }
> -
> + /* writing the CMD_RING_ABORT bit should create a command completion
> + * event, add a command completion timeout for it as well
> + */
> + mod_timer(&xhci->cmd_timer, jiffies + XHCI_CMD_DEFAULT_TIMEOUT);
> return 0;
> }

Hello Mathias,

Thanks for the patch.

After application of above patch, I'm getting following prints constantly:

xhci-hcd xhci-hcd.0.auto: Command timeout
xhci-hcd xhci-hcd.0.auto: Abort command ring
xhci-hcd xhci-hcd.0.auto: Command timeout on stopped ring
xhci-hcd xhci-hcd.0.auto: Turn aborted command be56e000 to no-op
xhci-hcd xhci-hcd.0.auto: // Ding dong!
...
xhci-hcd xhci-hcd.0.auto: Command timeout
xhci-hcd xhci-hcd.0.auto: Abort command ring
xhci-hcd xhci-hcd.0.auto: Command timeout on stopped ring
xhci-hcd xhci-hcd.0.auto: Turn aborted command be56e000 to no-op
xhci-hcd xhci-hcd.0.auto: // Ding dong!

As expected, xhci_handle_command_timeout is called again and next time ring state
is __not__ CMD_RING_STATE_RUNNING, Hence xhci_handle_stopped_cmd_ring is called
which turn all the aborted commands to no-ops and again makes the ring state as
CMD_RING_STATE_RUNNING, and rings the door bell.

But again in this case, no response from USB controller and xhci_alloc_dev is still waiting for
wait_for_completion.

Does rescheduling the xhci->cmd_timer ensures command completion will be called. IMO, it is still
dependent on HW response.

Please share your opinion.

>
> -Mathias