Re: [PATCH V4 0/5] mlx5 ConnectX control misc driver

From: Saeed Mahameed
Date: Fri Mar 22 2024 - 21:27:55 EST


On 22 Mar 15:29, Jakub Kicinski wrote:
On Fri, 22 Mar 2024 18:44:23 -0300 Jason Gunthorpe wrote:
On Fri, Mar 22, 2024 at 01:58:26PM -0700, Jakub Kicinski wrote:
> > Well said, David.
> >
> > I would totally support doing something like this in a fairly generic
> > way that could be leveraged/instantiated by drivers that will allow
> > communication/inspection of hardware blocks in the datapath. There are
> > lots of different ways this could go, so feedback on this would help get
> > us all moving in the right direction.
>
> The more I learn, the more I am convinced that the technical
> justifications here are just smoke and mirrors.

Let's see some evidence of this then, point to some sillicon devices
in the multibillion gate space that don't have complex FW built into
their design?

Existence of complex FW does not imply that production systems must
have a backdoor to talk to that FW in kernel-unmitigated fashion.

As an existence proof I give you NICs we use at Meta.
Or old Netronome NICs, you can pick.


This is not true at all, at least for our NICs. Our NICs do need
non-netdev interfaces at least for debug and monitoring non-netdev
functionality and use-cases at Meta. We can talk about this offline.
Also below you mentioned another one of your vendors using proprietary
mechanism for configuration. So you can't just have it both ways.

It is obvious to everyone that in the AI era, everyone needs
customization, this interface is the proposal for the standardization,
if you cared to look at Jason's proposal you will see how he goes in
length describing how abstraction can happen in user space.

> The main motivation for nVidia, Broadcom, (and Enfabrica?) being to
> hide as much as possible of what you consider your proprietary
> advantage in the "AI gold rush".

Despite all of those having built devices like this well before the
"AI gold rush" and it being a general overall design principle for the
industry because, yes, the silicon technology available actually
demands it.

It is not to say you couldn't do otherwise, it is just simply too
expensive.

I do agree that it is expensive, not sure if it's "too" expensive.
But Linux never promised that our way of doing SW development would
always be the most cost effective option, right? Especially short
term. Or that we'll be competitive time to market.

> RDMA is what it is but I really hate how you're trying to pretend
> that it's is somehow an inherent need of advanced technology and
> we need to lower the openness standards for all of the kernel.

Open hardware has never been an "openness standard" for the kernel.

I was in the meeting with a vendor this morning and when explicitly
asked by an SRE (not from my org nor in any way "primed" by me)
whether configuration of some run of the mill PCI thing can be exposed
via devlink params instead of whatever proprietary thing the vendor was
pitching, the vendor's answer was silence and then a pitch of another
proprietary mechanism.


Well, this is why we came up with fwctl interface, so nobody needs to sit
in silence and all vendors can agree to one interface.

We both know devlink params can't scale well enough and accommodate all
vendors and don't forget it's netdev specifc!

So no, the "open hardware" is certainly not a requirement for the
kernel. But users can't get vendors to implement standard Linux
configuration interfaces, and your proposal will make it a lot worse.

Vendors are already using proprietary configuration interfaces, using
direct PCI access via sysfs.. So on the contrary to what you say, this
proposal came to unify vendors, and improve the user's experience..
with fwctl, and the proper use-space shared tooling as Jason's suggested
you can force other vendors to follow the herd and implement the new
standard interfaces that we already have 3 vendors agree to..