RE: Re: Re: [syzbot] INFO: rcu detected stall in tx

From: Guido Kiener
Date: Wed May 05 2021 - 18:22:39 EST


> -----Original Message-----
> From: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>
> Sent: Tuesday, May 4, 2021 5:14 PM
> To: Kiener Guido 14DS1
> Subject: Re: Re: [syzbot] INFO: rcu detected stall in tx
>
> On Mon, May 03, 2021 at 09:56:05PM +0000, Guido Kiener wrote:
> > Hi all,
> >
> > Dave and I discussed the "self-detected stall on CPU" caused by the usbtmc
> driver.
> >
> > What happened?
> > The callback handler usbtmc_interrupt(struct urb *urb) for the INT pipe receives
> an erroneous urb with status -EPROTO (-71).
> > See
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre
> > e/drivers/usb/class/usbtmc.c?h=v5.12#n2340
> > -EPROTO does not abort/shutdown the pipe and the urb is resubmitted to receive
> the next packet. However the callback handler usbtmc_interrupt is called again with
> the same erroneous status -EPROTO and this seems to result in an endless loop.
> > According to
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre
> > e/Documentation/driver-api/usb/error-codes.rst?h=v5.12#n177
> > the error -EPROTO indicates a hardware problem or a bad cable.
> >
> > Most usb drivers do not react in a specific way on this hardware problems and
> resubmit the urb. We assume these drivers will run into the same endless loop.
> Some other driver samples are:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre
> > e/drivers/usb/class/cdc-acm.c?h=v5.12#n379
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre
> > e/drivers/hid/usbhid/usbmouse.c?h=v5.12#n65
> >
> > Possible solutions:
> > Hardware defects or bad cables seems to be a common problem for most usb
> drivers and I assume we do not want to fix this problem in all class specific drivers,
> but in lower level host drivers, e.g:
> > 1. Using a counter and close the pipe after some detected errors 2.
> > Delay the resubmission of the urb to avoid high cpu usage 3. Do
> > nothing, since it is just a rare problem.
> >
> > We've never seen this problem in our products and we do not dare to change
> anything.
>
> Drivers are not consistent in the way they handle these errors, as you have seen. A
> few try to take active measures, such as retrys with increasing timeouts. Many
> drivers just ignore them, which is not a very good idea.
>
> The general feeling among kernel USB developers is that a -EPROTO, -EILSEQ, or
> -ETIME error should be regarded as fatal, much the same as an unplug event. The
> driver should avoid resubmitting URBs and just wait to be unbound from the device.

Thanks for your assessment. I agree with the general feeling. I counted about hundred
specific usb drivers, so wouldn't it be better to fix the problem in some of the host drivers (e.g. urb.c)?
We could return an error when calling usb_submit_urb() on an erroneous pipe.
I cannot estimate the side effects and we need to check all drivers again how they deal with the
error situation. Maybe there are some special driver that need a specialized error handling.
In this case these drivers could reset the (new?) error flag to allow calling usb_submit_urb()
again without error. This could work, isn't it?

> If you would like to audit drivers and fix them up to behave this way, that would be
> great.

Currently not. I cannot pull the USB cable in home office :-), but I will keep an eye on it.
When I'm more involved in the next USB driver issue than I will test bad cables and
maybe get more ideas how we could test and fix this rare error.

> (FYI, by far the most common causes of these errors are: The user has unplugged
> the USB cable, or the device's firmware has crashed. It is quite rare for the cause to
> be intermittent, although not entirely unheard of -- for example, someone once
> reported errors resulting from EM or power-line interference caused by flickering
> fluorescent lights or something of that sort. It's pretty safe to ignore this possibility.)

I fear I may not use the 75 kW TV transmitter to interfere the USB cable :-)

-Guido