Re: [PATCH 00/15] Habana Labs kernel driver

From: Olof Johansson
Date: Thu Jan 24 2019 - 18:51:44 EST


Hi,

On Wed, Jan 23, 2019 at 11:36 PM Daniel Vetter <daniel.vetter@xxxxxxxx> wrote:
>
> Hi all,
>
> Top post, because new argument.

I'm diving in and replying to this instead of other replies upthread,
since I think it brings up the core of the disagreement.

> There's lots of really good technical arguments for having the
> userspace component of a driver stack that spans both kernel and
> userspace open too. For me, that's not really the important argument.
>
> I care about open source, I'm not interested in blobs (beyond that
> they're useful for reverse engineering). I think the upstream
> community should care about open source, and by and large it very much
> does: We haven't merged ndiswrapper, or the nvidia shim, or anything
> like that to make running blobs in the kernel easier. And at least in
> the case of the one traditional driver subsystem where 90% of the
> driver lives in userspace, we also care about that part being open.

Nobody is talking about merging kernel blobs. I think we're all in
agreement that it's absolutely out of question.

Traditionally, nearly all hardware has had closed firmware as well,
and if anything affects how we are tied down on making kernel-level
changes, this is a big one. What makes userspace different from that
perspective? Why do we have that double standard?

The question is if we're looking to alienate vendors and create a
whole new set of Nvidia-style driver stacks that will grow and grow,
or if we're willing to discuss with them and get them involved now, to
a point where we can come up with a reasonable,
standardized/extensible interface between upper levels of device FW,
through kernel and into low-level userspace. Getting them to separate
out the low-level portions of their software stacks to something that
is open is a medium-term good compromise in this direction (ideally
they might end up sharing this layer too, but that's not on me to
decide). Most of these pieces of hardware work in similar manners; a
stream of commands with data, and a stream of
completions/results/output data.

I'm incredibly impressed by how much of the graphics stack is open,
and how much of it has been reverse engineered for the closed
platforms. But if we have a chance to do it differently here, and in
particular avoid the long cycle of alienating the vendors and
encouraging them to build out-of-tree elaborate stacks for later
reverse engineering and catch-up, I would really like to.

There's currently one large benefit between these drivers and the
graphics space as far as I know; nobody's trying to do unified drivers
between Linux and other OS:es, so the whole "we need a messy shim
layer and a universal driver" situation should be avoidable (and to be
clear, we would not accept such drivers no matter what).

> Anything else is imo just a long-term dis-service to the community of
> customers, other vendors, ... Adapting a famous quote: If you're ok
> with throwing away some long term software freedom for a bit of short
> term hardware support you'll get neither.

The argument here is not "short term hardware support", since that's
not what we're adding (since you need more than the kernel pieces for
that). What we're able to do is collaborate instead of having all
these vendors work out-of-tree on their own with absolutely no
discussions with us at all, and nowhere to share their work without
setting up some new organization (with all the overhead from that). I
think getting people to collaborate in-tree is the best shot we have
at success.

> So if someone propose to merge some open source kernel driver that
> requires piles of closed source userspace to be any use at all, I'm
> just not interested. And if the fpga folks have merged fpga drivers
> without at least a basic (non-optimizing) RTL compiler, then that was
> a grave mistake. That doing this is also technically a bad idea (for
> all the reasons already discussed) is just the icing on the top for
> me.
>
> And to tie this back to the technical discussion, here's a scenario
> that's bound to happen:
> 1. vendor crams their open source driver into upstream, with full blob userspace
> 2. vendor gets bored (runs low on money, accidentally fired the entire
> old team, needs to do more value add, whatever, ...) rewrites the
> entire stack
> 3. vendor crams their new&completely incompatible open source stack
> into upstream
> 4. upstream is now unvoluntarily stuck maintaining 2 drivers for the
> exact same thing, and we can't fix anything of that because if you
> touch one side of the stack without undertstanding the other part
> you're guaranteed to create regressions (yes this is how this works
> with gpu drivers, we've learned this the hard way)
> 5. repeat

This can be avoided, in that we would not allow second completely
separate stacks. We should have a transition point where we don't
allow one-off weird custom drivers in the future, but we don't know
what the shared implementation will look like yet.

We have precedence from the wifi space, where we pushed back and got
vendors to move towards shared interfaces.

> Hence for these technical reasons you'll then end up with a subsystem
> that only the vendor can touch, and hence also the vendor can abandon
> at will. Not like drivers/gpu, where customers, consulting shops,
> students, ... routinely can&do add new features to existing drivers.
>
> This is not a winning move.

It depends on what the goal is. Complete software freedom? I agree,
this might not get us much closer to that (but also not further). And
if that's the goal, we should refuse to merge any driver that doesn't
have open device firmware as well. Why would we have double standards
in this area? Why are we allowing libusb to implement proprietary
userspace drivers?



So, let's loop back to the technical arguments instead.

What we want from a technical goal is to avoid broad proliferation of
completely separate out-of-tree software stacks, and get people to
collaborate and benefit from each others work in ways that we can
still change things over time where we need to from the kernel side.
Is anyone disagreeing with that (technical) goal?

Unless there's disagreement on the goal, where the views differ is on
how to get there -- whether we are better of pretending that this
hardware doesn't exist, and try to come up with some elaborate shared
framework that nobody is using yet, with the hopes that vendors will
move over from their proprietary stack once they've already been
successful in shipping that. Or whether we're better off getting them
engaged with us, picking up their drivers for the early hardware and
we all get exposure to the stacks and keep communication channels open
with clear understanding that we expect this engagement to shift over
time.

Since we're starting fresh here, we can set our own expectations
upfront: No second implementations unless they're onto a shared
framework, and we can even preserve the right to remove hardware
support (treat it as staging drivers) if a vendor disengages and goes
away, or if promises in other areas are broken (such as open low-level
userspace).


-Olof