[PATCH 6/6] usb: xhci: Update comments about Stop Endpoint races
From: Michal Pecio
Date: Mon Mar 10 2025 - 04:42:35 EST
The switch statement has grown beyond its original EP_STATE_HALTED case,
so move related comments inside this case and shorten them somewhat.
Shorten EP_STATE_STOPPED comments, add some common context at the top.
Signed-off-by: Michal Pecio <michal.pecio@xxxxxxxxx>
---
drivers/usb/host/xhci-ring.c | 70 ++++++++++++++++++++----------------
1 file changed, 39 insertions(+), 31 deletions(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 241cd82672a6..759e8f612b4d 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1199,20 +1199,21 @@ static void xhci_handle_cmd_stop_ep(struct xhci_hcd *xhci, int slot_id,
trace_xhci_handle_cmd_stop_ep(ep_ctx);
if (comp_code == COMP_CONTEXT_STATE_ERROR) {
- /*
- * If stop endpoint command raced with a halting endpoint we need to
- * reset the host side endpoint first.
- * If the TD we halted on isn't cancelled the TD should be given back
- * with a proper error code, and the ring dequeue moved past the TD.
- * If streams case we can't find hw_deq, or the TD we halted on so do a
- * soft reset.
- *
- * Proper error code is unknown here, it would be -EPIPE if device side
- * of enadpoit halted (aka STALL), and -EPROTO if not (transaction error)
- * We use -EPROTO, if device is stalled it should return a stall error on
- * next transfer, which then will return -EPIPE, and device side stall is
- * noted and cleared by class driver.
- */
+
+ /*
+ * This happens if the endpoint was not in the Running state. Its state now may
+ * be other than at the time the command failed. We need to look at the Endpoint
+ * Context and driver state to figure out what went wrong and what to do next.
+ *
+ * Many HCs are kind enough to hide their internal state transitions from us and
+ * never fail Stop Endpoint after a doorbell ring. Others, including vendors like
+ * NEC, ASMedia or Intel, may fail the command and begin running microseconds or
+ * even milliseconds later (up to an ESIT on NEC periodic endpoints).
+ *
+ * We may see Running or Halted state after the command really failed on Stopped.
+ * We may see Stopped after it failed on Halted, if somebody resets the endpoint.
+ */
+
switch (GET_EP_CTX_STATE(ep_ctx)) {
case EP_STATE_HALTED:
xhci_dbg(xhci, "Stop ep completion raced with stall\n");
@@ -1222,8 +1223,23 @@ static void xhci_handle_cmd_stop_ep(struct xhci_hcd *xhci, int slot_id,
*/
if (ep->ep_state & EP_HALTED)
goto reset_done;
-
+ /*
+ * Maybe reset failed with -ENOMEM. We are paranoid that a buggy HC may
+ * mishandle the race and produce no transfer event before this command
+ * completion. Or the endpoint actually was restarting, rejected Stop
+ * Endpoint, then started and halted. The transfer event may only come
+ * after this completion and some ASMedia HCs don't report such events.
+ *
+ * Try to reset the host endpoint now. Locate the halted TD, update its
+ * status and cancel it. Reset EP completion takes care of the rest.
+ *
+ * Proper status is unknown here, it would be -EPIPE if the device side
+ * endpoint stalled and -EPROTO otherwise (Transaction Error, etc).
+ * We use -EPROTO, if device is stalled it should keep returning STALL
+ * on future transfers which will hopefully receive normal handling.
+ */
if (ep->ep_state & EP_HAS_STREAMS) {
+ /* We don't know which stream failed, attempt Soft Retry */
reset_type = EP_SOFT_RESET;
} else {
reset_type = EP_HARD_RESET;
@@ -1231,39 +1247,31 @@ static void xhci_handle_cmd_stop_ep(struct xhci_hcd *xhci, int slot_id,
if (td)
td->status = -EPROTO;
}
- /* reset ep, reset handler cleans up cancelled tds */
err = xhci_handle_halted_endpoint(xhci, ep, td, reset_type);
xhci_dbg(xhci, "Stop ep completion resetting ep, status %d\n", err);
if (err)
+ /* persistent problem or disconnection, we must give back TDs */
break;
reset_done:
/* Reset EP handler will clean up cancelled TDs */
ep->ep_state &= ~EP_STOP_CMD_PENDING;
return;
+
case EP_STATE_STOPPED:
/*
- * Per xHCI 4.6.9, Stop Endpoint command on a Stopped
- * EP is a Context State Error, and EP stays Stopped.
- *
- * But maybe it failed on Halted, and somebody ran Reset
- * Endpoint later. EP state is now Stopped and EP_HALTED
- * still set because Reset EP handler will run after us.
+ * Maybe the command failed in Halted state and the transfer event queued
+ * Reset Endpoint. The endpoint is now Stopped and EP_HALTED is still set,
+ * because Reset Endpoint completion handler will run after this one.
*/
if (ep->ep_state & EP_HALTED)
break;
/*
- * On some HCs EP state remains Stopped for some tens of
- * us to a few ms or more after a doorbell ring, and any
- * new Stop Endpoint fails without aborting the restart.
- * This handler may run quickly enough to still see this
- * Stopped state, but it will soon change to Running.
- *
- * Assume this bug on unexpected Stop Endpoint failures.
- * Keep retrying until the EP starts and stops again.
+ * We don't queue Stop Endpoint on Stopped endpoints. Assume the pending
+ * restart case and keep retrying until it starts and stops again.
*/
fallthrough;
+
case EP_STATE_RUNNING:
- /* Race, HW handled stop ep cmd before ep was running */
xhci_dbg(xhci, "Stop ep completion ctx error, ctx_state %d\n",
GET_EP_CTX_STATE(ep_ctx));
/*
--
2.48.1