Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver

From: Gal Pressman
Date: Mon Sep 21 2020 - 07:23:24 EST


On 18/09/2020 18:28, Jason Gunthorpe wrote:
> On Fri, Sep 18, 2020 at 06:15:52PM +0300, Oded Gabbay wrote:
>
>> I'm sorry, but you won't be able to convince me here that I need to
>> "enslave" my entire code to RDMA, just because my ASIC "also" has some
>> RDMA ports.
>
> You can't recreate common shared subsystems in a driver just because
> you don't want to work with the subsystem.
>
> I don't care what else the ASIC has. In Linux the netdev part is
> exposed through netdev, the RDMA part through RDMA, the
> totally-not-a-GPU part through drivers/misc.
>
> It is always been this way. Chelsio didn't get to rebuild the SCSI
> stack in their driver just because "storage is a small part of their
> device"
>
> Drivers are not allowed to re-implement I2C/SPI/etc without re-using
> the comon code for that just because "I2C is a small part of their
> device"
>
> Exposing to userspace the creation of RoCE QPs and their related
> objects are unambiguously a RDMA subsystem task. I don't even know how
> you think you can argue it is not. It is your company proudly claiming
> the device has 100G RoCE ports in all the marketing literature, after
> all.
>
> It is too bad the device has a non-standards compliant implementation
> of RoCE so this will be a bit hard for you. Oh well.

What is considered a RoCE port in this case if it's not compliant with RoCE?
Sounds like it's an implementation of RDMA over ethernet, not RoCE.
Does GAUDI support UD/RC/.. QPs? Is it using a proprietary wire protocol?
(BTW, Oded claims it's similar to nvlink, how is nvlink's implementation
exposed? Or is it closed source?)

Jason, how do you imagine GAUDI in the RDMA subsystem? Userspace control path
verbs (used by hl-thunk?) and all data path verbs exposed as kverbs (used by
habanalabs driver)?
So neither any userspace verbs apps could use it nor kernel ULPs?