Re: [PATCH] drm/gpuvm: take refcount on DRM device
From: Danilo Krummrich
Date: Fri Apr 17 2026 - 15:33:43 EST
On Fri Apr 17, 2026 at 4:41 PM CEST, Thomas Hellström wrote:
> This is problematic since typically you also need a module reference
> when taking a drm device reference.
>
> The reason for this is that the devres reference on the drm device
> expects to be the last one, since it might be called from the module
> exit function of the driver.
No, this is not how it works; if this would be true then drmm_* would be pretty
pointless in the first place, as one could just use devm_* for everything.
Citing the commit introducing drmm_* APIs:
"The biggest wrong pattern is that developers use devm_, which ties the
release action to the underlying struct device, whereas all the
userspace visible stuff attached to a drm_device can long outlive that
one (e.g. after a hotunplug while userspace has open files and mmap'ed
buffers)."
> Now if there is an additional reference held at that point the driver module
> can be unloaded with a dangling reference to the drm device.
>
> On the other hand, if you in addition take a module reference then that
> blocks the driver module from being unloaded while held, just like a
> drm file reference. This leads to complicated module release schemes
> like the one in drm_pagemap where the module refcount is released from
> a work item that is waited on in the drm_pagemap exit function.
>
> I'm working to lift the module refcount requirement, but meanwhile I'd
> recommend that in the file close callback, we'd make sure all
> drm_gpuvms have called their drm_gpuvm_free() function, because then we
> are sure that the drm_device is still alive and the module still
> pinned.
If GPUVM has a pointer to the DRM device, it implies shared ownership and hence
GPUVM should account for this shared ownership and take a reference count.
The fact that GPUVM must not outlive module unload when it has driver callbacks
attached is an orthogonal requirement.
The module lifetime / callback issue is a separate problem that exists
regardless of whether you hold a device refcount. Not taking the refcount
doesn't make the module problem go away, it just adds a second, independent bug.
If struct drm_device itself, e.g. due to drm_dev_release() requires a module
refcount, then this is on struct drm_device to ensure this constraint (or remove
the requirement).
IOW, if I get to choose between a DRM component that has a pointer to a DRM
device stalls module unload and a DRM component that has a pointer to a DRM
device oopses the kernel when used wrongly, I prefer the former.
- Danilo