Re: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI commmand timeout

From: Mathias Nyman
Date: Wed Mar 23 2016 - 10:15:57 EST


On 23.03.2016 05:53, Rajesh Bhagat wrote:

IMO, The assumption that "xhci_abort_cmd_ring would always generate an
event and handle_cmd_completion would be called" will not be always be true if HW
is in bad state.

Please share your opinion.


writing the CA (command abort) bit in CRCR (command ring control register) will stop
the command ring, and CRR (command ring running) will be set to 0 by xHC.
xhci_abort_cmd_ring() polls this bit up to 5 seconds.
If it's not 0 then the driver considers the command abort as failed.

The scenario you're thinking of is that xHC would still react to CA bit set, it would stop
the command ring and set CRR 0, but not send a command completion event.

Have you tried adding some debug to handle_cmd_completion() and see if you receive
any event after command abortion?


Yes. We have added debug prints at first line of handle_cmd_completion, and we are not getting
those prints. The last print messages that we get are as below from xhci_alloc_dev while resume
operation:

xhci-hcd xhci-hcd.0.auto: Command timeout
xhci-hcd xhci-hcd.0.auto: Abort command ring

May be somehow, USB controller is in bad state and not responding to the commands.

Please suggest how XHCI driver can handle such situations.


Restart the command timeout timer when writing the command abort bit.
If we get theIf we get the abort event the timer is deleted.

Otherwise if the timout triggers a second time we end up calling
xhci_handle_command_timeout() with a stopped ring,
This will call xhci_handle_stopped_cmd_ring(), turn the aborted command to no-op, restart the
command ring, and finally when the no-op completes it should call the missing completion.

If command ring doesn't start then additional code could be added to xhci_handle_command_timeout()
that clears the command ring if it is called a second time (=current command is already in
abort state and command ring is stopped when entering xhci_handle_command_timeout)

There might be some details missing, I'm not able to test any of this, but try
something like this:

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 3e1d24c..576819e 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -319,7 +319,10 @@ static int xhci_abort_cmd_ring(struct xhci_hcd *xhci)
xhci_halt(xhci);
return -ESHUTDOWN;
}
-
+ /* writing the CMD_RING_ABORT bit should create a command completion
+ * event, add a command completion timeout for it as well
+ */
+ mod_timer(&xhci->cmd_timer, jiffies + XHCI_CMD_DEFAULT_TIMEOUT);
return 0;
}

-Mathias