Re: [PATCH v2] xhci: pci: Disable soft retry for Renesas uPD720201
From: Michal Pecio
Date: Fri Jun 19 2026 - 06:43:04 EST
> On 6/17/26 13:09, raoxu wrote:
> > From: Xu Rao <raoxu@xxxxxxxxxxxxx>
> >
> > The Renesas uPD720201 xHCI controller can fail to complete
> > a Stop Endpoint command after a transaction error on an interrupt
> > endpoint when soft retry is used.
> >
> > This was reproduced with this setup:
> >
> > xHCI: Renesas uPD720201, PCI ID 1912:0014 rev 03
> > dev: USB Ethernet device with an integrated Genesys Logic
> > USB3.1 hub, USB ID 05e3:0626, and a Realtek RTL8153
> > Ethernet function, USB ID 0bda:8153
Same thing with uPD720202 (1912:0015) here.
Is the hub even necessary? In my case I have one too, but I cannot
separate it from the RTL8153 for testing.
> > Reproducer:
> >
> > 1. Plug the integrated USB hub and Ethernet device into the
> > 1912:0014 xHCI controller.
> > 2. Let r8152 bind to the 0bda:8153 RTL8153 Ethernet function
> > behind the integrated hub.
> > 3. Bring the Ethernet device up.
> > 4. Hot-unplug the device.
In my case, necessary step 3.5: connect a cable and wait for the
"r8152: carrier on" message. Otherwise it disconnects cleanly.
> > The host reports a transaction error on the RTL8153 interrupt
> > endpoint, queues a soft reset, and later times out the Stop
> > Endpoint command while disconnecting the device:
> >
> > Transfer error for slot 8 ep 6 on endpoint
> > Soft-reset ep 6, slot 8
> > Ignoring reset ep completion code of 1
> > xHCI host not responding to stop endpoint command
> > xHCI host controller not responding, assume dead
> > HC died; cleaning up
There is other stuff too, like concurrent teardown of a separate bulk
endpoint, not yet sure what exactly breaks these chips.
Would you mind to apply the attached debug patch, reproduce and post
dmesg from your system for comparison?
> > The Renesas 1912:0014 controller cannot safely use the xHCI soft
> > retry path. Set XHCI_NO_SOFT_RETRY for this controller so
> > transaction errors use the pre-soft-retry recovery path. With
> > this quirk the same hot-unplug test no longer times out the Stop
> > Endpoint command and the RTL8153 remains usable and stable.
A bit heavy handed, but we might find no better way.
On Thu, 18 Jun 2026 17:03:26 +0300, Mathias Nyman wrote:
> I'd appreciate your opinion on a related issue.
> I'm thinking about trying to recover from these stop endpoint command
> timeouts.
I can share a bit of mine. I tried aborting Stop EP on Etron and found
the EP in some bogus state afterwards (e.g. Running but Stop EP fails
with Context State Error, or Stopped but not responing to doorbells,
something like that, I don't remember).
Per xHCI 4.6.9 there isn't really a case when this command should time
out, so it's always some internal bug/deadlock in the xHC and IMO good
chance that abort will leave at least this one EP or slot broken.
Another case is ASMedia, which doesn't seem to implement abort at all -
at least in my tests with Address Device and a dummy device that always
NAKs, abort simply waits for the command to finish (these chips have
internal 3 second timeout on Address Device). I would expect the same
for Stop EP, except that it likely lacks internal timeout. And the
driver will busy-wait for several seconds with IRQs disabled.
> While debugging this, did xHC controller otherwise seem somewhat
> functional? Did you for example see port status change events, or
> transfer events between queuing the stop endpoint command and the
> timeout?
Mouse continues to work until we kill the HC. And I can even abort the
command, but then some URB is never given back, so teardown of the USB
device gets stuck and IDK what would happen later.
Such recovery would be a bit of work, potential chip specific bugs and
frankly we can' be sure if the EP won't try to begin executing URBs.
The spec would advise to reset the broken chip, but that's also not
easy to do, particularly if we would like USB devices to maintain their
state. On the upside, I think it's similar to existing "USB persist"
mechanism, so core and drivers might be able to handle such things.
Regards,
Michal