[BUG] USB ethernet on XHCI dies under load

From: Chris Bainbridge
Date: Thu Feb 25 2016 - 10:20:58 EST


USB ethernet devices stop responding when plugged in to USB3 XHCI ports
and flooded with incoming traffic. The usbnet layer seems to get -EPROTO
from the xhci driver. Nothing is usually logged in kernel when this
occurs, but with dyndebug on there are errors like:

ax88179_178a 4-1.1:1.0 eth0: tx throttle -71
intr status -71

Sometimes:

xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff88024ea1adc8

To reproduce just flood the adaptor with gigabit traffic from another
computer for a couple of minutes with any of:

ping -f $host
iperf3 -c $host -t 1000 -P 128
iperf3 -c $host -u -b $bandwidth
(for iperf tests run 'iperf3 -s' on the test platform)

iperf3 shows many errors/retries. If slub_debug is on (specifically
slub_debug=U) there are more retries and it fails faster, which
suggests some kind of memory corruption. Other slub_debug options do not
appear to have any effect. Another symptom of memory corruption is that
other USB devices plugged in to different ports will disconnect and
reconnect. The problem is not these devices though, as the USB ethernet
still fails when all other devices are unplugged.

I have reproduced this with an AX88179 USB3 gigabit ethernet adaptor and
a dm9601 USB1 10/100 adaptor, on both a laptop with Intel XHCI chipset
and a desktop with VIA XHCI chipset. With the dm9601 adaptor just
running "iperf3 -c z -t 10 -u -b 12m" (12mbit UDP stream) is enough to
quickly kill it.

I have verified that the bug does not occur with USB2 chipset of the
desktop, only USB3.

This bug appears to have been around a long time. There are some
relevant bug threads, several refer specifically to disconnects on
flooding with AX88179:

http://www.spinics.net/lists/linux-usb/msg113358.html
https://bugzilla.kernel.org/show_bug.cgi?id=75381
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1371233
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1269883

I bisected this with the AX88179 but it just led back to
v3.11-rc1-224-g452c447a497d ("USBNET: increase max rx/tx qlen for
improving USB3 thoughtput") which is probably not the actual cause of
the bug.