Re: [PATCH v4 0/2] Add p2p via dmabuf to habanalabs

From: Daniel Vetter
Date: Tue Jul 06 2021 - 15:06:48 EST


On Tue, Jul 6, 2021 at 8:31 PM Jason Gunthorpe <jgg@xxxxxxxx> wrote:
> On Tue, Jul 06, 2021 at 07:35:55PM +0200, Daniel Vetter wrote:
> > Yup. We dont care about any of the fancy pieces you build on top, nor
> > does the compiler need to be the optimizing one. Just something that's
> > good enough to drive the hw in some demons to see how it works and all
> > that. Generally that's also not that hard to reverse engineer, if
> > someone is bored enough, the real fancy stuff tends to be in how you
> > optimize the generated code. And make it fit into the higher levels
> > properly.
>
> Seems reasonable to me
>
> > And it's not just nvidia, it's pretty much everyone. Like a soc
> > company I don't want to know started collaborating with upstream and
> > the reverse-engineered mesa team on a kernel driver, seems to work
> > pretty well for current hardware.
>
> What I've seen is that this only works with customer demand. Companies
> need to hear from their customers that upstream is what is needed, and
> companies cannot properly hear that until they are at least already
> partially invested in the upstream process and have the right
> customers that are sophisticated enough to care.
>
> Embedded makes everything 10x worse because too many customers just
> don't care about upstream, you can hack your way through everything,
> and indulge in single generation thinking. Fork the whole kernel for 3
> years, EOL, no problem!

It's not entirely hopeless in embedded either. Sure there's the giant
pile of sell&forget abandonware, but there are lots of embedded things
where multi-year to multi-decade support is required. And an upstream
gfx stack beats anything the vendor has to offer on that, easily.

And on the server side it's actually pretty hard to convince customers
of the upstream driver benefits, because they don't want or can't
abandon nvidia and have just learned to accept the pain. They either
build a few abstraction layers on top (and demand the vendor support
those), or they flat out demand you support the nvidia broprietary
interfaces. And AMD has been trying to move the needle here for years,
with not that much success.

> It is the enterprise world, particularly with an opinionated company
> like RH saying NO stuck in the middle that really seems to drive
> things toward upstream.
>
> Yes, vendors can work around Red Hat's No (and NVIDIA GPU is such an
> example) but it is incredibly time consuming, expensive and becoming
> more and more difficult every year.
>
> The big point is this:
>
> > But also nvidia is never going to sell you that as the officially
> > supported thing, unless your ask comes back with enormous amounts of
> > sold hardware.
>
> I think this is at the core of Linux's success in the enterprise
> world. Big customers who care demanding open source. Any vendor, even
> nvidia will want to meet customer demands.
>
> IHMO upstream success is found by motivating the customer to demand
> and make it "easy" for the vendor to supply it.

Yup, exactly same situation here. The problem seems to be a bit that
gpu vendor stubbornness is higher than established customer demand
even, or they just don't care, and so in the last few years that
customer demand has resulted in payment to consulting shops and hiring
of engineers into reverse-engineering a full driver, instead of
customer and vendor splitting the difference and the vendor
upstreaming their stack. And that's for companies who've done it in
the past, or at least collaborated on parts like the kernel driver, so
I really have no clue why they don't just continue. We have
well-established customers who do want it all open and upstream,
across kernel and userspace pieces.

And it looks like it's going to repeat itself a few more times
unfortunately. I'm not sure when exactly the lesson will sink in.

Maybe I missed some, but looking at current render/compute drivers I
think (but not even sure on that) only drm/lima is a hobbyist project
and perhaps you want to include drm/nouveau as not paid by customers
and more something redhat does out of principle. All the others are
paid for by customers, with vendor involvement ranging from "just
helping out with the kernel driver" to "pays for pretty much all of
the development". And still apparently that's not enough demand for an
upstream driver stack.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch