Re: [PATCH v1 1/2] drm/ttm: don't leave bulk_move cursor dangling for unevictable resources

From: Christian König

Date: Tue Jun 16 2026 - 03:16:39 EST


On 6/16/26 01:49, Samuel Ainsworth wrote:
> ttm_resource_add_bulk_move() and ttm_resource_del_bulk_move() both act
> only when the resource is evictable (!ttm_resource_unevictable()). A
> resource is added to its bo's bulk_move cursor (pos->first / pos->last)
> while evictable, but it can become unevictable -- pinned or swapped --
> after it has been added.
>
> ttm_resource_del_bulk_move() is reached both when the resource is freed
> (ttm_resource_free()) and when the bo's bulk_move is cleared on teardown
> (ttm_bo_set_bulk_move()). If the resource has become unevictable by then,
> the del is skipped, so pos->first / pos->last are left pointing at it.
> Once the resource is freed the cursor dangles, and the next
> ttm_resource_add_bulk_move() / ttm_resource_move_to_lru_tail() on that
> bulk_move dereferences it: a use-after-free read of
> pos->first->bo->base.resv (the WARN_ON in ttm_lru_bulk_move_add())
> followed by a list_move() through freed memory that corrupts the LRU
> list. With CONFIG_DEBUG_LIST this manifests as a fatal "list_del
> corruption" BUG.
>
> On a Framework 13 (AMD Ryzen 7040, gfx1103) this is hit via hibernation:
> a buffer object swapped out during hibernate (its resource becomes
> unevictable) is later closed after resume (amdgpu_gem_object_close ->
> amdgpu_vm_bo_del -> ttm_bo_set_bulk_move()), which skips removing its
> resource from the VM's bulk_move cursor; a later GEM allocation on that
> cursor then faults. KASAN reports a slab-use-after-free in
> ttm_resource_add_bulk_move().
>
> Track whether a resource is actually on the bulk_move cursor with a new
> ttm_resource::bulk_move flag, set when it is added, and remove based on
> that flag rather than on the resource's current evictability. The del
> then always undoes what the add did, regardless of any pin/swap
> transition in between.

Good catch, but the fix looks incorrect to me.

Please don't add any flags to ttm_resource. The bulk move is per BO and not resource.

Why are we not removing un-evictable resource from a bulk move? That sounds broken to me in the first place.

Regards,
Christian.

>
> Fixes: fc5d96670eb2 ("drm/ttm: Move swapped objects off the manager's LRU list")
> Cc: stable@xxxxxxxxxxxxxxx
> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5387
> Signed-off-by: Samuel Ainsworth <skainsworth@xxxxxxxxx>
> ---
> drivers/gpu/drm/ttm/ttm_resource.c | 18 +++++++++++++++---
> include/drm/ttm/ttm_resource.h | 9 +++++++++
> 2 files changed, 24 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c
> index 192fca24f37e..1a031ef151a7 100644
> --- a/drivers/gpu/drm/ttm/ttm_resource.c
> +++ b/drivers/gpu/drm/ttm/ttm_resource.c
> @@ -280,16 +280,27 @@ static bool ttm_resource_unevictable(struct ttm_resource *res, struct ttm_buffer
> void ttm_resource_add_bulk_move(struct ttm_resource *res,
> struct ttm_buffer_object *bo)
> {
> - if (bo->bulk_move && !ttm_resource_unevictable(res, bo))
> + if (bo->bulk_move && !ttm_resource_unevictable(res, bo)) {
> ttm_lru_bulk_move_add(bo->bulk_move, res);
> + res->bulk_move = true;
> + }
> }
>
> /* Remove the resource from a bulk move if the BO is configured for it */
> void ttm_resource_del_bulk_move(struct ttm_resource *res,
> struct ttm_buffer_object *bo)
> {
> - if (bo->bulk_move && !ttm_resource_unevictable(res, bo))
> + /*
> + * Remove based on whether the resource was actually added, not on its
> + * current evictability: a resource can become unevictable (pinned or
> + * swapped) after being added, and must still be taken off the bulk_move
> + * cursor before it is freed -- otherwise pos->first/last are left
> + * dangling at freed memory.
> + */
> + if (res->bulk_move) {
> ttm_lru_bulk_move_del(bo->bulk_move, res);
> + res->bulk_move = false;
> + }
> }
>
> /* Move a resource to the LRU or bulk tail */
> @@ -303,7 +314,7 @@ void ttm_resource_move_to_lru_tail(struct ttm_resource *res)
> if (ttm_resource_unevictable(res, bo)) {
> list_move_tail(&res->lru.link, &bdev->unevictable);
>
> - } else if (bo->bulk_move) {
> + } else if (res->bulk_move) {
> struct ttm_lru_bulk_move_pos *pos =
> ttm_lru_bulk_move_pos(bo->bulk_move, res);
>
> @@ -339,6 +350,7 @@ void ttm_resource_init(struct ttm_buffer_object *bo,
> res->bus.is_iomem = false;
> res->bus.caching = ttm_cached;
> res->bo = bo;
> + res->bulk_move = false;
>
> man = ttm_manager_type(bo->bdev, place->mem_type);
> spin_lock(&bo->bdev->lru_lock);
> diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h
> index 33e80f30b8b8..1fedf75bab96 100644
> --- a/include/drm/ttm/ttm_resource.h
> +++ b/include/drm/ttm/ttm_resource.h
> @@ -274,6 +274,15 @@ struct ttm_resource {
> * @lru: Least recently used list, see &ttm_resource_manager.lru
> */
> struct ttm_lru_item lru;
> +
> + /**
> + * @bulk_move: Whether this resource is currently tracked by its bo's
> + * &ttm_buffer_object.bulk_move cursor. Recorded when the resource is
> + * added so the matching del removes it even if the resource has since
> + * become unevictable (pinned or swapped) -- otherwise the cursor would
> + * be left pointing at this resource after it is freed.
> + */
> + bool bulk_move;
> };
>
> /**
> --
> 2.54.0
>