Re: [PATCH v4 0/2] Add p2p via dmabuf to habanalabs

From: Daniel Vetter
Date: Tue Jul 06 2021 - 14:03:18 EST


I should stop typing and prep dinner, but I found some too hilarious
typos below.

On Tue, Jul 6, 2021 at 7:35 PM Daniel Vetter <daniel.vetter@xxxxxxxx> wrote:
>
> On Tue, Jul 6, 2021 at 6:29 PM Jason Gunthorpe <jgg@xxxxxxxx> wrote:
> >
> > On Tue, Jul 06, 2021 at 05:49:01PM +0200, Daniel Vetter wrote:
> >
> > > The other thing to keep in mind is that one of these drivers supports
> > > 25 years of product generations, and the other one doesn't.
> >
> > Sure, but that is the point, isn't it? To have an actually useful
> > thing you need all of this mess
> >
> > > > My argument is that an in-tree open kernel driver is a big help to
> > > > reverse engineering an open userspace. Having the vendors
> > > > collaboration to build that monstrous thing can only help the end goal
> > > > of an end to end open stack.
> > >
> > > Not sure where this got lost, but we're totally fine with vendors
> > > using the upstream driver together with their closed stack. And most
> > > of the drivers we do have in upstream are actually, at least in parts,
> > > supported by the vendor. E.g. if you'd have looked the drm/arm driver
> > > you picked is actually 100% written by ARM engineers. So kinda
> > > unfitting example.
> >
> > So the argument with Habana really boils down to how much do they need
> > to show in the open source space to get a kernel driver? You want to
> > see the ISA or compiler at least?
>
> Yup. We dont care about any of the fancy pieces you build on top, nor
> does the compiler need to be the optimizing one. Just something that's
> good enough to drive the hw in some demons to see how it works and all

s/demons/demos/ but hw tends to be funky enough that either fits :-)

> that. Generally that's also not that hard to reverse engineer, if
> someone is bored enough, the real fancy stuff tends to be in how you
> optimize the generated code. And make it fit into the higher levels
> properly.
>
> > That at least doesn't seem "extreme" to me.
> >
> > > > For instance a vendor with an in-tree driver has a strong incentive to
> > > > sort out their FW licensing issues so it can be redistributed.
> > >
> > > Nvidia has been claiming to try and sort out the FW problem for years.
> > > They even managed to release a few things, but I think the last one is
> > > 2-3 years late now. Partially the reason is that there don't have a
> > > stable api between the firmware and driver, it's all internal from the
> > > same source tree, and they don't really want to change that.
> >
> > Right, companies have no incentive to work in a sane way if they have
> > their own parallel world. I think drawing them part by part into the
> > standard open workflows and expectations is actually helpful to
> > everyone.
>
> Well we do try to get them on board part-by-part generally starting
> with the kernel and ending with a proper compiler instead of the usual
> llvm hack job, but for whatever reasons they really like their
> in-house stuff, see below for what I mean.
>
> > > > > I don't think the facts on the ground support your claim here, aside
> > > > > from the practical problem that nvidia is unwilling to even create an
> > > > > open driver to begin with. So there isn't anything to merge.
> > > >
> > > > The internet tells me there is nvgpu, it doesn't seem to have helped.
> > >
> > > Not sure which one you mean, but every once in a while they open up a
> > > few headers, or a few programming specs, or a small driver somewhere
> > > for a very specific thing, and then it dies again or gets obfuscated
> > > for the next platform, or just never updated. I've never seen anything
> > > that comes remotely to something complete, aside from tegra socs,
> > > which are fully supported in upstream afaik.
> >
> > I understand nvgpu is the tegra driver that people actualy
> > use. nouveau may have good tegra support but is it used in any actual
> > commercial product?
>
> I think it was almost the case. Afaik they still have their internal
> userspace stack working on top of nvidia, at least last year someone
> fixed up a bunch of issues in the tegra+nouveau combo to enable format
> modifiers properly across the board. But also nvidia is never going to
> sell you that as the officially supported thing, unless your ask comes
> back with enormous amounts of sold hardware.
>
> And it's not just nvidia, it's pretty much everyone. Like a soc
> company I don't want to know started collaborating with upstream and

s/know/name/ I do know them unfortunately quite well ...

Cheers, Daniel

> the reverse-engineered mesa team on a kernel driver, seems to work
> pretty well for current hardware. But for the next generation they
> decided it's going to be again only their in-house tree that
> completele ignores drivers/gpu/drm, and also tosses all the
> foundational work they helped build on the userspace side. And this is
> consistent across all companies, over the last 20 years I know of
> (often non-public) stories across every single company where they
> decided that all the time invested into community/upstream
> collaboration isn't useful anymore, we go all vendor solo for the next
> one.
>
> Most of those you luckily don't hear about anymore, all it results in
> the upstream driver being 1-2 years late or so. But even the good ones
> where we collaborate well can't seem to help themselves and want to
> throw it all away every few years.
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch



--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch