Re: [RFC PATCH 0/3] new subsystem for compute accelerator devices
From: Dave Airlie
Date: Mon Oct 24 2022 - 22:27:32 EST
On Tue, 25 Oct 2022 at 12:21, John Hubbard <jhubbard@xxxxxxxxxx> wrote:
>
> On 10/24/22 05:43, Oded Gabbay wrote:
>
> Hi Oded,
>
> The patches make sense to me. I'm still just reading through and looking
> for minor issues, but at a high level it seems to match what the LPC
> discussions pointed to.
>
> >> What's your opinion on the long-term prospect of DRM vs accel? I assume
> >> that over time, DRM helpers will move into accel and some DRM drivers
> >> will start depending on accel?
> > I don't think that is what I had in mind.
> > What I had in mind is that accel helpers are only relevant for accel
> > drivers, and any code that might also be relevant for DRM drivers will
> > be placed in DRM core code. e.g. GEM enhancements, RAS netlink
>
> Yes. That is how I understood it ("it" being both the LPC discussions,
> and this patchset) as well:
>
> * accel-only code goes in drivers/accel, thus allowing for
> smaller, simpler drivers (as compared to full drm) for that case.
>
> * graphics and display code still goes in drivers/gpu/drm, because
> it is much too hard to rename or move that directory.
>
> * code common to both also goes in drivers/gpu/drm.
>
> Looking ahead a bit more:
>
> For full-featured GPUs that do both Graphics and Compute, I expect
> that a *lot* of the code will end up in drivers/gpu/drm. Because so
> much of setting up for Compute is also really just setting up for
> Graphics--that's how it evolved, after all!
>
> And as things are structured now, it looks like those full featured
> GPU stacks will also need an aux bus (which I only just now learned
> about, but it looks quite helpful here). And also, user space will
> need to open both /dev/dri/* and /dev/accel/* nodes, if it needs
> access to anything live objects that drivers/accel owns.
>
I actually don't know if we really need to worry about compute nodes
for fully featured devices.
The userspace for those is normally bespoke like ROCm, which uses
amdkfd, and amdkfd doesn't operate like most device files from what I
know, so I'm not sure we'd want it to operate as an accel device.
Or the userspace is OpenCL like where we have stacks that already bind
using the drm interfaces so again not sure if there's any value there.
For anything which already has a userspace and stuff I don't think
this adds any value, for nvidia type cards I doubt there is much use
in using an accel node for the GPU related things at all.
Dave.