Re: [PATCH V4 0/5] mlx5 ConnectX control misc driver

From: Edward Cree
Date: Thu Apr 04 2024 - 13:35:30 EST


[ Again, I had better disassociate my employer from the vitriol below,
for which I alone am responsible. ]

On 02/04/2024 19:40, Jason Gunthorpe wrote:
> Uh no. Alot of this configuration stuff is to serve niche customer
> interests where the customers have no interest in publishing and
> commonizing their method of operating because it would give away their
> own operator secrets. The vendor's created it because their big
> customers demanded it.
>
> eg there are configurables in mlx5 that exist *soley* to accomodate
> certain customer's pre-existing SW.

So it's a single-user hack, why do you want support for it in upstream?
Oh right, because you want to be able to ship single-user hacks to all
your customers while still getting the _cachet_ of being An Intree
Driver with the implied engineering benefits of open source — despite
the actual feature implementations being obscured in firmware that's
not subject to the review process and thus doesn't actually carry
those benefits.

> There are something like 600-800 configurables in mlx5

So because your engineers can't design clean, orthogonal APIs for
toffee, you should be allowed to bypass review? Sorry, but no.
The right approach is to find generic mechanisms that cover multiple
customer configurations by allowing each customer to specify a policy
— which is better not just for the kernel but also for your customers
(they can experiment more easily with new policies) and even for you
(you don't have to spend engineer time on implementing hundreds of
single-purpose configuration switches, and can instead focus on
making better products). And of course the customer's policy, with
their 'valuable' operator secrets, stays in-house.

This is not some revolutionary new idea or blue-sky architecture
astronaut thinking; this is the way the Unix engineering tradition
has worked for half a century.

And it's not like we don't give you the tools to do it!
BPF demonstrates that the kernel is perfectly willing to expose highly
complex configuration primitives to userspace, *as long as* the
interface is well-defined and cross-vendor.
Of course, without knowing what your several hundred knobs are for, I
can't tell you even in the broadest sense what shape a clean config
system to replace them would look like. But 800 magic flags isn't it.

> Where is the screaming? Where has keeping blessed support out of
> the kernel got us?

Well, clearly *someone* wants you to supply them an in-tree driver,
else you wouldn't be trying to upstream this. Now maybe they really
do just have a psychological aversion to TAINT_OOT_MODULE, but it's
more likely that the underlying reason is the improved reliability,
maintainability, and portability that come from the upstreaming
process. And that in turn only happens because the kernel does not
"bless" crap.

> The actual benefit of common names for the individual configuration
> values is pretty tiny.

In case I still need to make it clearer: the purpose of requiring
your configurables to go through review is not so you can have
"common names" for 800 magic flags. It's so that you are forced to
come up with configurables that actually have a sensible design and
meaningful semantics that you can define in a way that gives you
something to *put* in the name and the commit message that's not
just "magic behaviour tweak #42 for $BigCustomer".

> a second argument about who gets to have power in our community.

As I understand it, power in Linux is entirely social and informal.
If you think Kuba doesn't have standing to object, there's nothing
*technical* stopping you from applying the patches with his
Nacked-by tag included in the commit messages, then sending the
resulting PR to Linus.
And if you think that Linus would reject the PR in that case, then
you're implicitly conceding that Kuba *does* have standing.