Re: [PATCH] drm/gpuvm: take refcount on DRM device

From: Thomas Hellström

Date: Fri Apr 17 2026 - 10:41:42 EST


Hi,

On Thu, 2026-04-16 at 13:10 +0000, Alice Ryhl wrote:
> Currently GPUVM relies on the owner implicitly holding a refcount to
> the
> drm device, and it does not implicitly take a refcount on the drm
> device. This design is error-prone, so take a refcount on the device.
>
> Suggested-by: Danilo Krummrich <dakr@xxxxxxxxxx>
> Signed-off-by: Alice Ryhl <aliceryhl@xxxxxxxxxx>

This is problematic since typically you also need a module reference
when taking a drm device reference.

The reason for this is that the devres reference on the drm device
expects to be the last one, since it might be called from the module
exit function of the driver. Now if there is an additional reference
held at that point the driver module can be unloaded with a dangling
reference to the drm device.

On the other hand, if you in addition take a module reference then that
blocks the driver module from being unloaded while held, just like a
drm file reference. This leads to complicated module release schemes
like the one in drm_pagemap where the module refcount is released from
a work item that is waited on in the drm_pagemap exit function.

I'm working to lift the module refcount requirement, but meanwhile I'd
recommend that in the file close callback, we'd make sure all
drm_gpuvms have called their drm_gpuvm_free() function, because then we
are sure that the drm_device is still alive and the module still
pinned.

Thanks,
Thomas


> ---
>  drivers/gpu/drm/drm_gpuvm.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/drm_gpuvm.c
> b/drivers/gpu/drm/drm_gpuvm.c
> index 44acfe4120d2..000e7910a899 100644
> --- a/drivers/gpu/drm/drm_gpuvm.c
> +++ b/drivers/gpu/drm/drm_gpuvm.c
> @@ -25,6 +25,7 @@
>   *
>   */
>  
> +#include <drm/drm_drv.h>
>  #include <drm/drm_gpuvm.h>
>  #include <drm/drm_print.h>
>  
> @@ -1117,6 +1118,7 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm, const
> char *name,
>   gpuvm->drm = drm;
>   gpuvm->r_obj = r_obj;
>  
> + drm_dev_get(drm);
>   drm_gem_object_get(r_obj);
>  
>   drm_gpuvm_warn_check_overflow(gpuvm, start_offset, range);
> @@ -1160,13 +1162,15 @@ static void
>  drm_gpuvm_free(struct kref *kref)
>  {
>   struct drm_gpuvm *gpuvm = container_of(kref, struct
> drm_gpuvm, kref);
> + struct drm_device *drm = gpuvm->drm;
>  
>   drm_gpuvm_fini(gpuvm);
>  
> - if (drm_WARN_ON(gpuvm->drm, !gpuvm->ops->vm_free))
> + if (drm_WARN_ON(drm, !gpuvm->ops->vm_free))
>   return;
>  
>   gpuvm->ops->vm_free(gpuvm);
> + drm_dev_put(drm);
>  }
>  
>  /**
>
> ---
> base-commit: 126c50bc2fb6ddfe5b7718de67bbd7592a1062bb
> change-id: 20260416-gpuvm-drm-dev-get-5ded89c39bb3
>
> Best regards,