Re: [PATCH RFC RFT] usb: hcd: Add a usb_device argument to hc_driver.endpoint_reset()

From: Michał Pecio
Date: Wed Apr 09 2025 - 06:19:43 EST


On Tue, 8 Apr 2025 16:55:24 +0300, Mathias Nyman wrote:
> 1. driver->reset_device will free all xhci endpoint rings, and lose
> td_list head, but keep cancelled_td_list and ep->ep_state flags. xHC
> issues reset device command setting all internal ep states in xci to
> "disabled".
>
> 2. usb hcd_alloc_bandwith will drop+add xhci endpoints for this
> configuration, allocate new endpoint rings, and inits new td_list
> head. Old cancelled_td_list and ep_state flags are still set, not
> matching ring.
>
> 3. usb_disable_interface() will flush all pending URBs calling
> xhci_urb_dequeue(). xhci_urb_dequeue() makes decision based on stale
> ep_state flags. May start to cancel/giveback urbs in old
> cancelled_td_list for tds that existed on old freed ring. will also
> set host_ep->hcpriv to null
>
> 4. usb_enable_interface() calls xhci_endpoint_reset() that finally
> clears the EP_STALLED flag (udev now found thanks to this patch)
>
> Disabling endpoints, like calling usb_disable_interface() in step 3
> should be done before calling usb_hcd_alloc_bandwith().
> This was fixed in other places such as usb_set_interface() and
> usb_reset_config()

I haven't thought about it, but you are right. This means that my
commit message is wrong to suggest that the problem occurs after
altsetting changes, it is apparently unique to device reset case.

> We might need to clean up ep_state flags and give back cancelled URBs
> in xhci_discover_or_reset_device() after the reset device xhci
> command completion.

I guess it wouldn't be strictly necessary if core flushed all endpoints
before resetting. I frankly always assumed that it does so, because it
also makes sense for other HCDs which don't even define reset_device().

There seems to be nothing stopping them from talking to a device while
it is undergoing a reset and re-configuration, seems a little risky
given that HW isn't exactly free of its own bugs.

Speaking of xhci, I wonder if this could also be responsible for some
xHC crashes during enumeration. I recall that those Etron quirk patches
mentioned events for a disabled endpoint and "HC died" in some cases.

Regards,
Michal