Re: [RFC PATCH 0/3] new subsystem for compute accelerator devices

From: Oded Gabbay
Date: Mon Oct 24 2022 - 16:46:50 EST


On Mon, Oct 24, 2022 at 2:56 PM Thomas Zimmermann <tzimmermann@xxxxxxx> wrote:
>
> Hi
>
> Am 22.10.22 um 23:46 schrieb Oded Gabbay:
> > In the last couple of months we had a discussion [1] about creating a new
> > subsystem for compute accelerator devices in the kernel.
> >
> > After an analysis that was done by DRM maintainers and myself, and following
> > a BOF session at the Linux Plumbers conference a few weeks ago [2], we
> > decided to create a new subsystem that will use the DRM subsystem's code and
> > functionality. i.e. the accel core code will be part of the DRM subsystem.
> >
> > This will allow us to leverage the extensive DRM code-base and
> > collaborate with DRM developers that have experience with this type of
> > devices. In addition, new features that will be added for the accelerator
> > drivers can be of use to GPU drivers as well (e.g. RAS).
> >
> > As agreed in the BOF session, the accelerator devices will be exposed to
> > user-space with a new, dedicated device char files and a dedicated major
> > number (261), to clearly separate them from graphic cards and the graphic
> > user-space s/w stack. Furthermore, the drivers will be located in a separate
> > place in the kernel tree (drivers/accel/).
> >
> > This series of patches is the first step in this direction as it adds the
> > necessary infrastructure for accelerator devices to DRM. The new devices will
> > be exposed with the following convention:
> >
> > device char files - /dev/accel/accel*
> > sysfs - /sys/class/accel/accel*/
> > debugfs - /sys/kernel/debug/accel/accel*/
>
> I know I'm really late to this discussion, but wouldn't 'compute' be a
> better name?
I also thought like you :)
But I consulted with Dave while writing these patches and he suggested
accel/accel.
I'm fine either way...

>
> (I agree that skynet would also be nice :)
>
> >
> > I tried to reuse the existing DRM code as much as possible, while keeping it
> > readable and maintainable.
> >
> > One thing that is missing from this series is defining a namespace for the
> > new accel subsystem, while I'll add in the next iteration of this patch-set,
> > after I will receive feedback from the community.
> >
> > As for drivers, once this series will be accepted (after adding the namespace),
> > I will start working on migrating the habanalabs driver to the new accel
> > subsystem. I have talked about it with Dave and we agreed that it will be
> > a good start to simply move the driver as-is with minimal changes, and then
> > start working on the driver's individual features that will be either added
> > to the accel core code (with or without changes), or will be removed and
> > instead the driver will use existing DRM code.
>
> What's your opinion on the long-term prospect of DRM vs accel? I assume
> that over time, DRM helpers will move into accel and some DRM drivers
> will start depending on accel?
I don't think that is what I had in mind.
What I had in mind is that accel helpers are only relevant for accel
drivers, and any code that might also be relevant for DRM drivers will
be placed in DRM core code. e.g. GEM enhancements, RAS netlink
support. btw, I suspect this will be the majority of the code.
In addition, DRM drivers should never set the new DRIVER_COMPUTE_ACCEL
driver feature in their structure so they should have zero dependency
on the accel core code.

>
> After reading the provided links, I wondered if we shouldn't rename
> drivers/gpu to drivers/accel and put the new subsystem into
> drivers/accel/compute. We'd have DRM and compute devices next to each
> other and shared helpers could be located in other subdirectories within
> accel/
I think this idea was brought up at the BOF session and Dave and
others said it will be too big of a burden (due to backports) to do
it.
>From Dave's blogpost:
"Moving things around now for current drivers is too hard to deal with
for backports etc. Adding a new directory for accel drivers would be a
good plan, even if they used the drm framework."

Thanks,
Oded
>
> Best regards
> Thomas
>
> >
> > In addition, I know of at least 3 or 4 drivers that were submitted for review
> > and are good candidates to be included in this new subsystem, instead of being
> > a drm render node driver or a misc driver.
> >
> > [1] https://lkml.org/lkml/2022/7/31/83
> > [2] https://airlied.blogspot.com/2022/09/accelerators-bof-outcomes-summary.html
> >
> > Thanks,
> > Oded
> >
> > Oded Gabbay (3):
> > drivers/accel: add new kconfig and update MAINTAINERS
> > drm: define new accel major and register it
> > drm: add dedicated minor for accelerator devices
> >
> > Documentation/admin-guide/devices.txt | 5 +
> > MAINTAINERS | 8 +
> > drivers/Kconfig | 2 +
> > drivers/accel/Kconfig | 24 +++
> > drivers/gpu/drm/drm_drv.c | 214 +++++++++++++++++++++-----
> > drivers/gpu/drm/drm_file.c | 69 ++++++---
> > drivers/gpu/drm/drm_internal.h | 5 +-
> > drivers/gpu/drm/drm_sysfs.c | 81 +++++++++-
> > include/drm/drm_device.h | 3 +
> > include/drm/drm_drv.h | 8 +
> > include/drm/drm_file.h | 21 ++-
> > include/drm/drm_ioctl.h | 1 +
> > 12 files changed, 374 insertions(+), 67 deletions(-)
> > create mode 100644 drivers/accel/Kconfig
> >
> > --
> > 2.34.1
> >
>
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Ivo Totev