Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
From: Leon Romanovsky
Date: Fri Sep 18 2020 - 08:03:47 EST
On Fri, Sep 18, 2020 at 02:56:09PM +0300, Oded Gabbay wrote:
> On Fri, Sep 18, 2020 at 2:52 PM Leon Romanovsky <leon@xxxxxxxxxx> wrote:
> >
> > On Fri, Sep 18, 2020 at 02:36:10PM +0300, Gal Pressman wrote:
> > > On 17/09/2020 20:18, Jason Gunthorpe wrote:
> > > > On Tue, Sep 15, 2020 at 11:46:58PM +0300, Oded Gabbay wrote:
> > > >> infrastructure for communication between multiple accelerators. Same
> > > >> as Nvidia uses NVlink, we use RDMA that we have inside our ASIC.
> > > >> The RDMA implementation we did does NOT support some basic RDMA
> > > >> IBverbs (such as MR and PD) and therefore, we can't use the rdma-core
> > > >> library or to connect to the rdma infrastructure in the kernel.
> > > >
> > > > You can't create a parallel RDMA subsystem in netdev, or in misc, and
> > > > you can't add random device offloads as IOCTL to nedevs.
> > > >
> > > > RDMA is the proper home for all the networking offloads that don't fit
> > > > into netdev.
> > > >
> > > > EFA was able to fit into rdma-core/etc and it isn't even RoCE at
> > > > all. I'm sure this can too.
> > >
> > > Well, EFA wasn't welcomed to the RDMA subsystem with open arms ;), initially it
> > > was suggested to go through the vfio subsystem instead.
> > >
> > > I think this comes back to the discussion we had when EFA was upstreamed, which
> > > is what's the bar to get accepted to the RDMA subsystem.
> > > IIRC, what we eventually agreed on is having a userspace rdma-core provider and
> > > ibv_{ud,rc}_pingpong working (or just supporting one of the IB spec's QP types?).
> > >
> > > Does GAUDI fit these requirements? If not, should it be in a different subsystem
> > > or should we open the "what qualifies as an RDMA device" question again?
> >
> > I want to remind you that rdma-core requirement came to make sure that
> > anything exposed from the RDMA to the userspace is strict with proper
> > UAPI header hygiene.
> >
> > I doubt that Havana's ioctls are backed by anything like this.
> >
> > Thanks
>
> Why do you doubt that ? Have you looked at our code ?
> Our uapi and IOCTLs interface is based on drm subsystem uapi interface
> and it is very safe and protected.
Yes, I looked and didn't find open-source users of your UAPI headers.
It is not related to being safe or protected by to the common request
to present userspace that relies on those exported interfaces.
> Otherwise Greg would have never allowed me to go upstream in the first place.
Nice, can we get a link?
>
> We have a single function which is the entry point for all the IOCTLs
> of our drivers (only one IOCTL is RDMA related, all the others are
> compute related).
> That function is almost 1:1 copy of the function in drm.
DRM has same rules as RDMA, no kernel code will be merged without seeing
open-source userspace.
Thanks
>
> Thanks,
> Oded