Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
From: Jason Gunthorpe
Date: Thu Sep 17 2020 - 13:19:46 EST
On Tue, Sep 15, 2020 at 11:46:58PM +0300, Oded Gabbay wrote:
> infrastructure for communication between multiple accelerators. Same
> as Nvidia uses NVlink, we use RDMA that we have inside our ASIC.
> The RDMA implementation we did does NOT support some basic RDMA
> IBverbs (such as MR and PD) and therefore, we can't use the rdma-core
> library or to connect to the rdma infrastructure in the kernel.
You can't create a parallel RDMA subsystem in netdev, or in misc, and
you can't add random device offloads as IOCTL to nedevs.
RDMA is the proper home for all the networking offloads that don't fit
into netdev.
EFA was able to fit into rdma-core/etc and it isn't even RoCE at
all. I'm sure this can too.
> wanted to do it but when we analyzed it, we saw we wouldn't be able to
> support basic stuff and therefore we had to revert to our IOCTLs.
Try again. Ask for help.
Your patches add CQs, WQ, and other RDMA objects. This is very clearly
not an appropriate functionality for netdev.
> To sum it up, because our NIC is used for intra-communication, we
> don't expose nor intend users to use it as a NIC per-se. However, to
> be able to get statistics and manage them in a standard way, and
> support control plane over Ethernet, we do register each port to the
> net subsystem (i.e. create netdev per port).
Sure, the basic ethernet side is conceptually fine.
> > Please make sure to CC linux-rdma. You clearly stated that the device
> > does RDMA-like transfers.
>
> We don't use the RDMA infrastructure in the kernel and we can't
> connect to it due to the lack of H/W support we have so I don't see
> why we need to CC linux-rdma.
Because you can't put RDMA like concepts under net.
Jakub, NAK from me on this series.
Jason