Re: [PATCH 1/3] drm/panthor: Don't use the racy drm_gem_lru_remove() helper

From: Liviu Dudau

Date: Thu May 07 2026 - 10:44:57 EST


On Thu, May 07, 2026 at 02:10:27PM +0200, Boris Brezillon wrote:
> On Thu, 7 May 2026 11:01:25 +0100
> Liviu Dudau <liviu.dudau@xxxxxxx> wrote:
>
> > On Wed, May 06, 2026 at 02:16:26PM +0200, Boris Brezillon wrote:
> > > drm_gem_lru_remove() dereference stores drm_gem_object::lru in a local
> > > variable that's then dereferenced to acquire the LRU lock. Because this
> > > assignment in done without the LRU lock held, it can race with
> > > drm_gem_lru_scan() where drm_gem_object::lru is temporarily assigned
> > > a stack-allcated LRU that goes away when leaving the function. By
> > > the time we dereference this local lru variable, the object might already
> > > be gone.
> > >
> > > It feels like drm_gem_lru_move_tail() was never meant to be used this
> > > way, because there's no easy way we can avoid this race unless we defer
> > > the locking to the caller. Let's add an explicit LRU for unreclaimable
> > > BOs instead, and have all BOs added to this LRU at creation time.
> >
> > I would argue that drm_gem_lru_scan() is broken by design. If you're going
> > to release the LRU lock in the middle of a loop you can expect that someone
> > will get hold of your stack-allocated LRU and end up picking the pieces.
>
> I think it's fine as long as you always use the drm_gem_lru helpers to
> manipulate the lru field, which is true of a lot of kernel constructs.

I think drm_gem_lru_scan() should never set an object's lru field to still_in_lru.
It should set it to NULL when the object's node is removed from its lru and add
it into still_in_lru without making the drm_gem_object->lru to point back to it.
At the very end when we splice back the still_in_lru list back into lru's list we
can then update obj->lru.

>
> > This patch is fine in itself by trying to avoid stepping into the fight,
> > but I think we should also add a warning in drm_gem_lru_scan() for future
> > users to be aware of the dangers.
>
> Warning the user about what? There's nothing they can do about it, and
> I don't even think it's unsafe per-se, unless someone goes off and
> stores the drm_gem_object::lru value somewhere else while their shrink()
> callback is called, and accesses it later, outside the shrinker path.
> Given drm_gem_lru is not refcounted, there's no way one could safely
> hold on the LRU they saw in the shrink() callback anyway, so I don't
> think that's fair to blame the drm_gem_lru API for this kind of misuse.

Yeah, that would be the warning: don't store the object's lru as you might
get a temporary one that will become invalid after the shrinker has run.

Best regards,
Liviu

--
====================
| I would like to |
| fix the world, |
| but they're not |
| giving me the |
\ source code! /
---------------
¯\_(ツ)_/¯