Re: [PATCH v2] usbhid: tolerate intermittent errors
From: Liam Mitchell
Date: Sun Mar 08 2026 - 09:50:05 EST
On Sat, 7 Mar 2026 at 23:53, Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Sat, Mar 07, 2026 at 07:57:09PM +0100, Liam Mitchell wrote:
> > Modifies usbhid error handling to tolerate intermittent protocol
> > errors, avoiding URB resubmission delay and device reset.
> >
> > ---
> > Protocol errors like EPROTO can occur randomly, sometimes frequently and are often not fixed by a device reset.
> >
> > The current error handling will only resubmit the URB after at least 13ms delay and may reset the USB device if another error occurs 1-1.5s later, regardless of error type or count.
> >
> > These delays and device resets increase the chance that input events will be missed and that users see symptoms like missed or sticky keyboard keys.
> >
> > This patch allows one protocol error per 500ms to be tolerated and have the URB re-submitted immediately.
> >
> > 500ms was chosen to be well above the error rate of a malfunctioning
> > device but low enough to be useful for users with devices noisier than
> > mine.
> >
> > Signed-off-by: Liam Mitchell <mitchell.liam@xxxxxxxxx>
> > Link: https://lore.kernel.org/linux-input/CAOQ1CL6Q+4GNy=kgisLzs0UBXFT3b02PG8t-0rPuW-Wf6NhQ6g@xxxxxxxxxxxxxx/
> > ---
>
> Liam, take a look at
>
> https://bugzilla.kernel.org/show_bug.cgi?id=221184
>
> On Roman's system, these protocol errors occur fairly regularly,
> apparently caused by high system load.
Thanks Alan, I commented there.
> Do you think a better approach might be to reduce the 13-ms delay to
> just 1 or 2 ms, and perform a reset only there has been no successful
> communication for about one second? This might perhaps be _too_ lenient
> sometimes, but I don't think such situations will arise in practice.
I would prefer to have at least the first error resubmit immediately.
I want to reduce device downtime and missed events. From that
perspective, I want to assume the error is intermittent unless we see
evidence otherwise.
The current reset logic "reset only if there has been no successful
communication for one second" is problematic because there is no sign
of successful communication if the user isn't pressing keys or moving
the mouse. Two EPROTO errors 1.4 seconds apart will trigger device
reset and 100-200ms of downtime when ideally URBs would be immediately
resubmitted with only a few ms of downtime.
Can we infer from not receiving errors that we have successful
communication? That might change the equation. If we don't receive
errors for say 10x the polling interval, can we assume it is working?
Ideally the reset is only triggered when we are very sure the device
is not working and needs it.
> The reason for having at least a small delay is to avoid getting into a
> tight resubmit/error loop in cases where the device has been unplugged.
>
> Alan Stern
This patch would only allow one immediate resubmission per window
(500ms). How costly is a URB submission? I was assuming they are
relatively cheap and even one per 100ms wouldn't cause problems.
Liam