Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver

From: Leon Romanovsky
Date: Mon Sep 21 2020 - 06:40:01 EST


On Sun, Sep 20, 2020 at 10:05:39PM +0300, Oded Gabbay wrote:
> On Sun, Sep 20, 2020 at 11:47 AM Greg Kroah-Hartman
> <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Sat, Sep 19, 2020 at 04:22:35PM -0300, Jason Gunthorpe wrote:
> > > On Sat, Sep 19, 2020 at 07:27:30PM +0200, Greg Kroah-Hartman wrote:
> > > > > It's probably heresy, but why do I need to integrate into the RDMA subsystem ?
> > > > > I understand your reasoning about networking (Ethernet) as the driver
> > > > > connects to the kernel networking stack (netdev), but with RDMA the
> > > > > driver doesn't use or connect to anything in that stack. If I were to
> > > > > support IBverbs and declare that I support it, then of course I would
> > > > > need to integrate to the RDMA subsystem and add my backend to
> > > > > rdma-core.
> > > >
> > > > IBverbs are horrid and I would not wish them on anyone. Seriously.
> > >
> > > I'm curious what drives this opinion? Did you have it since you
> > > reviewed the initial submission all those years ago?
> >
> > As I learned more about that interface, yes, I like it less and less :)
> >
> > But that's the userspace api you all are stuck with, for various
> > reasons, my opinion doesn't matter here.
> >
> > > > I think the general rdma apis are the key here, not the userspace api.
> > >
> > > Are you proposing that habana should have uAPI in drivers/misc and
> > > present a standard rdma-core userspace for it? This is the only
> > > userspace programming interface for RoCE HW. I think that would be
> > > much more work.
> > >
> > > If not, what open source userspace are you going to ask them to
> > > present to merge the kernel side into misc?
> >
> > I don't think that they have a userspace api to their rdma feature from
> > what I understand, but I could be totally wrong as I do not know their
> > hardware at all, so I'll let them answer this question.
>
> Hi Greg,
> We do expose a new IOCTL to enable the user to configure connections
> between multiple GAUDI devices.

How is it different from RDMA QP configuration?

>
> Having said that, we restrict this IOCTL to be used only by the same
> user who is doing the compute on our device, as opposed to a real RDMA
> device where any number of applications can send and receive.

The ability to support multiple applications is not RDMA-requirement,
but the implementation. For example MPI jobs are single user of RDMA device.

> In addition, this IOCTL limits the user to connect ONLY to another
> GAUDI device and not to a 3rd party RDMA device.

I don't see how it is different from EFA with their SQD QP type or mlx5
devices with DC QPs that you can connect only to similar devices (no
interoperability).

Thanks