Re: [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver

From: Oded Gabbay
Date: Thu Sep 10 2020 - 16:53:07 EST


On Thu, Sep 10, 2020 at 11:38 PM Andrew Lunn <andrew@xxxxxxx> wrote:
>
> On Thu, Sep 10, 2020 at 11:30:33PM +0300, Oded Gabbay wrote:
> > On Thu, Sep 10, 2020 at 11:25 PM Andrew Lunn <andrew@xxxxxxx> wrote:
> > >
> > > > Can you please elaborate on how to do this with a single driver that
> > > > is already in misc ?
> > > > As I mentioned in the cover letter, we are not developing a
> > > > stand-alone NIC. We have a deep-learning accelerator with a NIC
> > > > interface.
> > >
> > > This sounds like an MFD.
> > >
> > > Andrew
> >
> > Yes and no. There is only one functionality - training of deep
> > learning (Accelerating compute operations) :)
> > The rdma is just our method of scaling-out - our method of
> > intra-connection between GAUDI devices (similar to NVlink or AMD
> > crossfire).
> > So the H/W exposes a single physical function at the PCI level. And
> > thus Linux can call a single driver for it during the PCI probe.
>
> Yes, it probes the MFD driver. The MFD driver then creates platform
> drivers for the sub functions. i.e. it would create an Ethernet
> platform driver. That then gets probed in the usual way. The child
> device can get access to the parent device, if it needs to share
> things, e.g. a device on a bus. This is typically I2C or SPI, but
> there is no reason it cannot be a PCI device.
>
> Go look in drivers/mfd.
>
> Andrew

I'm slightly familiar with drivers/mfd and as you mentioned, those are
for "simple" devices, which use a bus with different functionality on
them, like I2C with many devices (sensors for various things, etc).
I've never seen anyone doing a PCI device there and frankly, I don't
see the benefit of trying to migrate our complex PCI driver to that
subsystem, if it will even work.
And I would like to reiterate that our NIC ports are highly integrated
with our compute engines.
They "talk" to each other via sync objects inside the SOC, and all of
them are used as part of the training of the deep learning network.
Another example why this is not MFD - when a compute engine gets
stuck, all the NIC ports are going through reset.
So it's not the same as multiple devices that use the same bus or H/W.
It's a single device with some engines that work in harmony.
The bottom line is that we have single functionality and the scale-out
is done via RDMA that is integrated on the device.
We could have chosen other ways to scale-out (like some proprietary
bus) and then would that count as another functionality ? I think not.

So I'm not going to drivers/mfd with our driver. I wish that I had
multiple PCI PF so I could do a proper Ethernet driver but I can't for
this H/W.
And I think that physically splitting the files into two subsystems
will be very hard to maintain and definitely I will want to hear
Greg's opinion on that.

Thanks,
Oded