Re: [PATCH v18 19/26] drm/shmem-helper: Add common memory shrinker

From: Boris Brezillon
Date: Mon Nov 13 2023 - 04:36:02 EST


On Fri, 10 Nov 2023 15:58:58 +0100
Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx> wrote:

> On Mon, 30 Oct 2023 02:01:58 +0300
> Dmitry Osipenko <dmitry.osipenko@xxxxxxxxxxxxx> wrote:
>
> > @@ -238,6 +308,20 @@ void drm_gem_shmem_put_pages(struct drm_gem_shmem_object *shmem)
> > if (refcount_dec_not_one(&shmem->pages_use_count))
> > return;
> >
> > + /*
> > + * Destroying the object is a special case because acquiring
> > + * the obj lock can cause a locking order inversion between
> > + * reservation_ww_class_mutex and fs_reclaim.
> > + *
> > + * This deadlock is not actually possible, because no one should
> > + * be already holding the lock when GEM is released. Unfortunately
> > + * lockdep is not aware of this detail. So when the refcount drops
> > + * to zero, we pretend it is already locked.
> > + */
> > + if (!kref_read(&shmem->base.refcount) &&
> > + refcount_dec_and_test(&shmem->pages_use_count))
> > + return drm_gem_shmem_free_pages(shmem);
>
> Uh, with get/put_pages() being moved to the create/free_gem()
> hooks, we're back to a situation where pages_use_count > 0 when we
> reach gem->refcount == 0, which is not nice. We really need to patch
> drivers so they dissociate GEM creation from the backing storage
> allocation/reservation + mapping of the BO in GPU VM space.

I gave this a try, and I think it'd work fine for lima (I have patches
if you want) and panfrost (I have patches for panfrost too, basically
addressing my comments on patch 15). It's a bit trickier for v3d, but
still possible (we'd have to add a gpu_map_count to v3d_bo, get it
incremented for each BO referenced by a job when the job is created,
decremented for each BO referenced by a job when the job is destroyed,
and have a _put_pages() call when gpu_map_count reached zero). The main
problem is virtio, where BO backing storage lifetime is really tied to
the BO object lifetime, and my virtio knowledge is too limited to come
up with an solution to dissociate those 2 things.

TLDR; let's just keep this hack for now. This being said, there are 2
things that I think should be addressed:

1/ we want this change in patch 18 ("drm/shmem-helper: Change sgt
allocation policy") to keep this series bisectable, otherwise you'll
hit the WARN_ON(shmem->pages_use_count != 0) in drm_gem_shmem_free()

2/ we should do:

if (!kref_read(&shmem->base.refcount) {
if (refcount_dec_and_test(&shmem->pages_use_count))
drm_gem_shmem_free_pages(shmem);

return;
}

if we don't want to try to acquire the resv lock when the user leaked
a pages ref (the core will complain about this leak in
drm_gem_shmem_free() anyway)

Regards,

Boris