Re: [PATCH v3 1/2] drm/gpuvm: add deferred vm_bo cleanup
From: Boris Brezillon
Date: Mon Oct 06 2025 - 07:38:46 EST
On Mon, 6 Oct 2025 13:30:59 +0200
Alice Ryhl <aliceryhl@xxxxxxxxxx> wrote:
> On Wed, Oct 1, 2025 at 12:41 PM Alice Ryhl <aliceryhl@xxxxxxxxxx> wrote:
> >
> > When using GPUVM in immediate mode, it is necessary to call
> > drm_gpuvm_unlink() from the fence signalling critical path. However,
> > unlink may call drm_gpuvm_bo_put(), which causes some challenges:
> >
> > 1. drm_gpuvm_bo_put() often requires you to take resv locks, which you
> > can't do from the fence signalling critical path.
> > 2. drm_gpuvm_bo_put() calls drm_gem_object_put(), which is often going
> > to be unsafe to call from the fence signalling critical path.
> >
> > To solve these issues, add a deferred version of drm_gpuvm_unlink() that
> > adds the vm_bo to a deferred cleanup list, and then clean it up later.
> >
> > The new methods take the GEMs GPUVA lock internally rather than letting
> > the caller do it because it also needs to perform an operation after
> > releasing the mutex again. This is to prevent freeing the GEM while
> > holding the mutex (more info as comments in the patch). This means that
> > the new methods can only be used with DRM_GPUVM_IMMEDIATE_MODE.
> >
> > Reviewed-by: Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx>
> > Signed-off-by: Alice Ryhl <aliceryhl@xxxxxxxxxx>
>
> In this version, I got rid of the kref_put_mutex() usage, but I
> realized that maybe we should bring it back. With the current code,
> it's actually possible to observe a zombie vm_bo in the GEM's list
> because we don't drop the refcount while holding the mutex.
Alright, let's get back to the kref_put_mutex() approach then.