Re: [PATCH 12/15] habanalabs/gaudi: add debugfs entries for the NIC

From: Oded Gabbay
Date: Tue Sep 15 2020 - 20:46:32 EST


On Mon, Sep 14, 2020 at 7:50 PM Jakub Kicinski <kuba@xxxxxxxxxx> wrote:
>
> On Mon, 14 Sep 2020 13:48:14 +0000 Omer Shpigelman wrote:
> > On Thu, Sep 10, 2020 at 11:31 PM Jakub Kicinski <kuba@xxxxxxxxxx> wrote:
> > > On Thu, 10 Sep 2020 23:17:59 +0300 Oded Gabbay wrote:
> > > > > Doesn't seem like this one shows any more information than can be
> > > > > queried with ethtool, right?
> > > > correct, it just displays it in a format that is much more readable
> > >
> > > You can cat /sys/class/net/$ifc/carrier if you want 0/1.
> > >
> > > > > > nic_mac_loopback
> > > > > > is to set a port to loopback mode and out of it. It's not really
> > > > > > configuration but rather a mode change.
> > > > >
> > > > > What is this loopback for? Testing?
> > > >
> > > > Correct.
> > >
> > > Loopback test is commonly implemented via ethtool -t
> >
> > This debugfs entry is only to set the port to loopback mode, not running a loopback test.
> > Hence IMO adding a private flag is more suitable here and please correct me if I'm wrong.
> > But either way, doing that from ethtool instead of debugfs is not a good practice in our case.
> > Due to HW limitations, when we switch a port to/from loopback mode, we need to reset the device.
> > Since ethtool works on specific interface rather than an entire device, we'll need to reset the device 10 times in order to switch the entire device to loopback mode.
> > Moreover, running this command for one interface affects other interfaces which is not desirable when using ethtool AFAIK.
> > Is there any other acceptable debugfs-like mechanism for that?
>
> What's the use for a networking device which only communicates with
> itself, other than testing?

No use, and we do have a suite of tests that runs from user-space on
the device after we move the interfaces to loopback mode.
The main problem, as Omer said, is that we have two H/W bugs:

1. Where you need to reset the entire SoC in case you want to move a
single interface into (or out of) loopback mode. So doing it via
ethtool will cause a reset to the entire SoC, and if you want to move
all 10 ports to loopback mode, you need to reset the device 10 times
before you can actually use that.

2. Our 10 ports are divided into 5 groups of 2 ports each, from H/W
POV. That means if you move port 0 to loopback mode, it will affect
port 1 (and vice-versa). I don't think we want that behavior.

That's why we need this specific exception to the rule and do it via
debugfs. I understand it is not common practice, but due to H/W bugs
we can't workaround, we ask this exception.

Thanks,
Oded