Re: [PATCH] drm/etnaviv: Create an accel device node if compute-only

From: Tomeu Vizoso
Date: Mon May 20 2024 - 03:39:39 EST


Hi Lucas,

On Fri, May 10, 2024 at 10:34 AM Lucas Stach <l.stach@xxxxxxxxxxxxxx> wrote:
>
> Hi Tomeu,
>
> Am Mittwoch, dem 24.04.2024 um 08:37 +0200 schrieb Tomeu Vizoso:
> > If we expose a render node for NPUs without rendering capabilities, the
> > userspace stack will offer it to compositors and applications for
> > rendering, which of course won't work.
> >
> > Userspace is probably right in not questioning whether a render node
> > might not be capable of supporting rendering, so change it in the kernel
> > instead by exposing a /dev/accel node.
> >
> > Before we bring the device up we don't know whether it is capable of
> > rendering or not (depends on the features of its blocks), so first try
> > to probe a rendering node, and if we find out that there is no rendering
> > hardware, abort and retry with an accel node.
> >
> I thought about this for a while. My opinion is that this is the wrong
> approach. We are adding another path to the kernel driver, potentially
> complicating the userspace side, as now the NPU backend needs to look
> for both render and accel nodes. While currently accel and drm are
> pretty closely related and we can share most of the driver, it might
> still be a maintenance hassle in the long run.
>
> On the other hand we already have precedence of compute only DRM
> devices exposing a render node: there are AMD GPUs that don't expose a
> graphics queue and are thus not able to actually render graphics. Mesa
> already handles this in part via the PIPE_CAP_GRAPHICS and I think we
> should simply extend this to not offer a EGL display on screens without
> that capability.

The problem with this is that the compositors I know don't loop over
/dev/dri files, trying to create EGL screens and moving to the next
one until they find one that works.

They take the first render node (unless a specific one has been
configured), and assumes it will be able to render with it.

To me it seems as if userspace expects that /dev/dri/renderD* devices
can be used for rendering and by breaking this assumption we would be
breaking existing software.

Which is what I understood to be the whole point behind the decision
to create a new device file hierarchy for accelerators. Or am I
missing something?

Adding Daniel Stone to CC in case he wants to give his opinion from
the compositor point of view.

Cheers,

Tomeu

> Regards,
> Lucas
>
> > Signed-off-by: Tomeu Vizoso <tomeu@xxxxxxxxxxxxxxx>
> > Cc: Oded Gabbay <ogabbay@xxxxxxxxxx>
> > ---
> > drivers/gpu/drm/etnaviv/etnaviv_drv.c | 46 ++++++++++++++++++++++-----
> > 1 file changed, 38 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> > index 6500f3999c5f..8e7dd23115f4 100644
> > --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> > +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> > @@ -11,6 +11,7 @@
> > #include <linux/platform_device.h>
> > #include <linux/uaccess.h>
> >
> > +#include <drm/drm_accel.h>
> > #include <drm/drm_debugfs.h>
> > #include <drm/drm_drv.h>
> > #include <drm/drm_file.h>
> > @@ -488,10 +489,10 @@ static const struct drm_ioctl_desc etnaviv_ioctls[] = {
> > ETNA_IOCTL(PM_QUERY_SIG, pm_query_sig, DRM_RENDER_ALLOW),
> > };
> >
> > -DEFINE_DRM_GEM_FOPS(fops);
> > +DEFINE_DRM_GEM_FOPS(render_fops);
> > +DEFINE_DRM_ACCEL_FOPS(accel_fops);
> >
> > -static const struct drm_driver etnaviv_drm_driver = {
> > - .driver_features = DRIVER_GEM | DRIVER_RENDER,
> > +static struct drm_driver etnaviv_drm_driver = {
> > .open = etnaviv_open,
> > .postclose = etnaviv_postclose,
> > .gem_prime_import_sg_table = etnaviv_gem_prime_import_sg_table,
> > @@ -500,7 +501,6 @@ static const struct drm_driver etnaviv_drm_driver = {
> > #endif
> > .ioctls = etnaviv_ioctls,
> > .num_ioctls = DRM_ETNAVIV_NUM_IOCTLS,
> > - .fops = &fops,
> > .name = "etnaviv",
> > .desc = "etnaviv DRM",
> > .date = "20151214",
> > @@ -508,15 +508,20 @@ static const struct drm_driver etnaviv_drm_driver = {
> > .minor = 4,
> > };
> >
> > -/*
> > - * Platform driver:
> > - */
> > -static int etnaviv_bind(struct device *dev)
> > +static int etnaviv_bind_with_type(struct device *dev, u32 type)
> > {
> > struct etnaviv_drm_private *priv;
> > struct drm_device *drm;
> > + bool is_compute_only = true;
> > int ret;
> >
> > + etnaviv_drm_driver.driver_features = DRIVER_GEM | type;
> > +
> > + if (type == DRIVER_RENDER)
> > + etnaviv_drm_driver.fops = &render_fops;
> > + else
> > + etnaviv_drm_driver.fops = &accel_fops;
> > +
> > drm = drm_dev_alloc(&etnaviv_drm_driver, dev);
> > if (IS_ERR(drm))
> > return PTR_ERR(drm);
> > @@ -553,6 +558,18 @@ static int etnaviv_bind(struct device *dev)
> >
> > load_gpu(drm);
> >
> > + for (unsigned int i = 0; i < ETNA_MAX_PIPES; i++) {
> > + struct etnaviv_gpu *g = priv->gpu[i];
> > +
> > + if (g && (g->identity.minor_features8 & chipMinorFeatures8_COMPUTE_ONLY) == 0)
> > + is_compute_only = false;
> > + }
> > +
> > + if (type == DRIVER_RENDER && is_compute_only) {
> > + ret = -EINVAL;
> > + goto out_unbind;
> > + }
> > +
> > ret = drm_dev_register(drm, 0);
> > if (ret)
> > goto out_unbind;
> > @@ -571,6 +588,19 @@ static int etnaviv_bind(struct device *dev)
> > return ret;
> > }
> >
> > +/*
> > + * Platform driver:
> > + */
> > +static int etnaviv_bind(struct device *dev)
> > +{
> > + int ret = etnaviv_bind_with_type(dev, DRIVER_RENDER);
> > +
> > + if (ret == -EINVAL)
> > + return etnaviv_bind_with_type(dev, DRIVER_COMPUTE_ACCEL);
> > +
> > + return ret;
> > +}
> > +
> > static void etnaviv_unbind(struct device *dev)
> > {
> > struct drm_device *drm = dev_get_drvdata(dev);
>