Re: [PATCH net-next v1 1/1] usbnet: add devlink support

From: Alan Stern
Date: Fri Jan 28 2022 - 10:33:16 EST


On Fri, Jan 28, 2022 at 12:27:06PM +0100, Oleksij Rempel wrote:
> On Thu, Jan 27, 2022 at 12:00:44PM -0500, Alan Stern wrote:
> > On Thu, Jan 27, 2022 at 12:13:53PM +0100, Greg KH wrote:
> > > On Thu, Jan 27, 2022 at 12:07:42PM +0100, Oleksij Rempel wrote:
> > > > To provide generic way to detect USB issues or HW issues on different
> > > > levels we need to make use of devlink.
> > >
> > > Please make this generic to all USB devices, usbnet is not special here
> > > at all.
> >
> > Even more basic question: How is the kernel supposed to tell the
> > difference between a USB issue and a HW issue? That is, by what
> > criterion do you decide which category a particular issue falls under?
>
> In case of networking device, from user space perspective, we have a
> communication issue with some external device over the Ethernet.
> So, depending on the health state of following chain:
> cpu->hcd->USB cable->ethernet_controller->ethernet_cable-<...
>
> We need to decide what to do, and what can be done automatically by
> device itself,

"Device"? Do you mean "driver"? I wouldn't expect the device to do
much of anything by itself.

> for example Mars rover :) The user space should get as
> much information as possible what's going on in the system, to decide
> the proper measures to fix or mitigate the problem.

I disagree. What you're talking about is a debugging facility.
Normally users do not want to get that much information. Particularly
since most of it is usually useless.

> System designers
> usually (hopefully) find out during testing what URB status and IP
> uplink status for that hardware means and how to fix that.

System designers generally have much different requirements from
ordinary users.

But let's go back to the chain you mentioned:

cpu->hcd->USB cable->ethernet_controller->ethernet_cable-> ...

In general there is no way to tell at what stage something went wrong.
For example, if the kernel does not receive a response to an URB, the
program could be in the CPU, the HCD, the USB cable, or the ethernet
controller, with no way to tell where it really is. (And that's
assuming the problem is a hardware failure, not a software bug!)

All we can do in the real world is record error responses. At the
moment we don't have any unified way of reporting them to userspace,
partly because nobody has asked for it and partly because error
responses don't always mean that something has failed. (For example,
they might mean that the system has asked to a device to perform an
action it doesn't support, or they might mean the user has suddenly
unplugged a USB cable.)

Greg's suggestion that you try it out and see how much signal you get
among all the noise is a good idea.

Alan Stern