Re: [PATCH v3 5/5] r8152: Block future register access if register access fails
From: Doug Anderson
Date: Tue Oct 17 2023 - 10:17:30 EST
Hi,
On Tue, Oct 17, 2023 at 6:07 AM Hayes Wang <hayeswang@xxxxxxxxxxx> wrote:
>
> Doug Anderson <dianders@xxxxxxxxxxxx>
> > Sent: Tuesday, October 17, 2023 12:47 AM
> [...
> > > > static int generic_ocp_read(struct r8152 *tp, u16 index, u16 size,
> > > > @@ -8265,6 +8353,19 @@ static int rtl8152_pre_reset(struct
> > usb_interface
> > > > *intf)
> > > > if (!tp)
> > > > return 0;
> > > >
> > > > + /* We can only use the optimized reset if we made it to the end of
> > > > + * probe without any register access fails, which sets
> > > > + * `PROBED_WITH_NO_ERRORS` to true. If we didn't have that then return
> > > > + * an error here which tells the USB framework to fully unbind/rebind
> > > > + * our driver.
> > >
> > > Would you stay in a loop of unbind and rebind,
> > > if the control transfers in the probe() are not always successful?
> > > I just think about the worst case that at least one control always fails in probe().
> >
> > We won't! :-) One of the first things that rtl8152_probe() does is to
> > call rtl8152_get_version(). That goes through to
> > rtl8152_get_version(). That function _doesn't_ queue up a reset if
> > there are communication problems, but it does do 3 retries of the
> > read. So if all 3 reads fail then we will permanently fail probe,
> > which I think is the correct thing to do.
>
> The probe() contains control transfers in
> 1. rtl8152_get_version()
> 2. tp->rtl_ops.init()
>
> If one of the 3 control transfers in 1) is successful AND
> any control transfer in 2) fails,
> you would queue a usb reset which would unbind/rebind the driver.
> Then, the loop starts.
> The loop would be broken, if and only if
> a) all control transfers in 1) fail, OR
> b) all control transfers in 2) succeed.
>
> That is, the loop would be broken when the fail rate of the control transfer is high or low enough.
> Otherwise, you would queue a usb reset again and again.
> For example, if the fail rate of the control transfer is 10% ~ 60%,
> I think you have high probability to keep the loop continually.
> Would it never happen?
Actually, even with a failure rate of 10% I don't think you'll end up
with a fully continuous loop, right? All you need is to get 3 failures
in a row in rtl8152_get_version() to get out of the loop. So with a
10% failure rate you'd unbind/bind 1000 times (on average) and then
(finally) give up. With a 50% failure rate I think you'd only
unbind/bind 8 times on average, right? Of course, I guess 1000 loops
is pretty close to infinite.
In any case, we haven't actually seen hardware that fails like this.
We've seen failure rates that are much much lower and we can imagine
failure rates that are 100% if we're got really broken hardware. Do
you think cases where failure rates are middle-of-the-road are likely?
I would also say that nothing we can do can perfectly handle faulty
hardware. If we're imagining theoretical hardware, we could imagine
theoretical hardware that de-enumerated itself and re-enumerated
itself every half second because the firmware on the device crashed or
some regulator kept dropping. This faulty hardware would also cause an
infinite loop of de-enumeration and re-enumeration, right?
Presumably if we get into either case, the user will realize that the
hardware isn't working and will unplug it from the system. While the
system is doing the loop of trying to enumerate the hardware, it will
be taking up a bunch of extra CPU cycles but (I believe) it won't be
fully locked up or anything. The machine will still function and be
able to do non-Ethernet activities, right? I would say that the worst
thing about this state would be that it would stress corner cases in
the reset of the USB subsystem, possibly ticking bugs.
So I guess I would summarize all the above as:
If hardware is broken in just the right way then this patch could
cause a nearly infinite unbinding/rebinding of the r8152 driver.
However:
1. It doesn't seem terribly likely for hardware to be broken in just this way.
2. We haven't seen hardware broken in just this way.
3. Hardware broken in a slightly different way could cause infinite
unbinding/rebinding even without this patch.
4. Infinite unbinding/rebinding of a USB adapter isn't great, but not
the absolute worst thing.
That all being said, if we wanted to address this we could try two
different ways:
a) We could add a global in the r8152 driver and limit the number of
times we reset. This gets a little ugly because if we have multiple
r8152 adapters plugged in then the same global would be used for both,
but maybe it's OK?
b) We could improve the USB core to somehow prevent usb_reset_device()
from running too much on a given device?
...though I would re-emphasize that I don't think this is something we
need to address now. If later we actually see a problem we can always
address it then.
-Doug