Re: USB disk speed regression WD Elements - with bisect result 22547c4cc4fe20698a6a85a55b8788859134b8e4
From: Alan Stern
Date: Wed Jul 26 2017 - 21:03:37 EST
[Added linux-usb mailing list to CC:]
Short description of bug: In 4.12 or later, when Zdenek's Western
Digital disk is attached to an EHCI controller, it ends up connecting
at full speed to the companion UHCI controller instead. But when
commit 22547c4cc4fe ("usb: hub: Wait for connection to be reestablished
after port reset") is reverted, the disk connects correctly at high
speed.
On Wed, 26 Jul 2017, Zdenek Kabelac wrote:
> Dne 25.7.2017 v 21:50 Alan Stern napsal(a):
> > On Tue, 25 Jul 2017, Zdenek Kabelac wrote:
> >
> >
> > 08:18 usb 2-2: USB disconnect, device number 2
> > 08:25 usb 2-2: new high-speed USB device number 3 using ehci-pci
> > 08:26 usb 2-2: new high-speed USB device number 4 using ehci-pci
> >
> > If the drive were working entirely correctly, it wouldn't do that.
> >
> > We could continue futzing around with hardware and driver tests for a
> > long time. But there may be a shortcut: If you have a USB hub, you
> > could try attaching the drive through it. It's entirely possible that
> > this will fix the problem.
> >
> > If not, you'll have to start doing some very detailed tests. As a
> > start, you can enable debugging for the usbcore and ehci_hcd drivers
> > immediately before the test:
> >
> > echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
> > echo 'module ehci_hcd =p' >/sys/kernel/debug/dynamic_debug/control
> > dmesg -C
> >
> > Then after the test, see what shows up in the dmesg output. And again,
> > we'll want to do a comparison. In fact, 4.12 with and without the
> > commit you identified would make a better comparison than 4.12 vs. 4.8.
> >
>
>
> Hi
>
>
> So here we go with traces - made with freshly compiled recent 4.13-rc2.
> OK trace is with revert patch applied.
> BAD trace is the one with it (essentially vaniala master).
>
> Trace also has KOBJECT debugging enabled - I think difference is
> nicely visible between them - but I've no explanation for it.
>
> Both traces start with cable detach followed with attachment.
Okay, I figured out the cause.
The USB stack assumes that if a high-speed device initialization fails
and the device is still connected, it means that the device can't run
properly at high speed and the computer needs to try again at full
speed. See commit 90da096ee46b ("USB: force handover port to companion
when hub_port_connect_change fails").
In Zdenek's case, the device really does disconnect itself before the
second port reset. If 22547c4cc4fe is present, it causes an extra
delay -- some 200 ms -- long enough for the device to reconnect again.
So we appear to be in the situation described above, and the connection
is switched over to the companion controller.
If 22547c4cc4fe is not present, the kernel sees that the device is not
connected at the end of the second reset and gives up trying to
initialize the device. When the device reconnects about 140 ms later,
the kernel treats it as a new connection and successfully negotiates a
high-speed link.
Zdenek, you check this explanation by commenting out these last two
lines at the end of hub_port_connect() in drivers/usb/core/hub.c:
if (hcd->driver->relinquish_port && !hub->hdev->parent)
hcd->driver->relinquish_port(hcd, port1);
That should prevent the connection from being handed over to the UHCI
companion, allowing the device to operate at high speed.
Alan Stern