Re: [PATCH v4] drm/imagination: Fix double call to drm_sched_entity_fini()
From: Alessio Belle
Date: Mon Jun 29 2026 - 10:06:57 EST
Hi Brajesh,
On Fri, 2026-06-19 at 12:59 +0530, Brajesh Gupta wrote:
> Call sequence of double call:
> pvr_context_destroy
> pvr_context_kill_queues
> pvr_queue_kill
> drm_sched_entity_destroy
> drm_sched_entity_fini // here
> pvr_context_put
> kref_put(..., pvr_context_release)
> pvr_context_destroy_queues
> pvr_queue_destroy
> drm_sched_entity_fini // here
>
> Call to drm_sched_entity_destroy() from pvr_context_kill_queues() calls
> drm_sched_entity_flush() + drm_sched_entity_fini().
> drm_sched_entity_flush() ensures all pending jobs are completed and
> drm_sched_entity_fini() ensures no further submission is allowed as
> per expectation from pvr_context_kill_queues(). Keep the
> drm_sched_entity_fini() only for pvr_context_create() failure path.
Could you point out somewhere around here that this double call was a misuse of
the API? (to convey that this was a bug)
>
> Stack trace for issue with addition of refcounting for DRM entity
> stats in commit fd177135f0e6 ("drm/sched: Account entity GPU time"):
> [ 789.490527] ------------[ cut here ]------------
> [ 789.490559] refcount_t: underflow; use-after-free.
> [ 789.490657] WARNING: lib/refcount.c:28 at refcount_warn_saturate+0xf4/0x144, CPU#0: kworker/u16:1/440
> [ 789.490695] Modules linked in: powervr drm_gpuvm drm_exec gpu_sched drm_shmem_helper xhci_plat_hcd xhci_hcd dwc3 usbcore usb_common snd_soc_simple_card snd_soc_simple_card_utils sa2ul sha512 sha256 dwc3_am62 sha1 authenc rti_wdt libsha512 at24 sch_fq_codel fuse dm_mod ipv6
> [ 789.490798] CPU: 0 UID: 0 PID: 440 Comm: kworker/u16:1 Not tainted 7.0.0-rc7-02049-g5e2c0700091b #22 PREEMPT
> [ 789.490809] Hardware name: Texas Instruments AM625 SK (DT)
> [ 789.490815] Workqueue: powervr-sched pvr_queue_fence_release_work [powervr]
> [ 789.490868] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 789.490876] pc : refcount_warn_saturate+0xf4/0x144
> [ 789.490884] lr : refcount_warn_saturate+0xf4/0x144
> [ 789.490892] sp : ffff8000822cbcc0
> [ 789.490895] x29: ffff8000822cbcc0 x28: 0000000000000000 x27: 0000000000000000
> [ 789.490909] x26: 0000000000000000 x25: ffff800081b1e338 x24: ffff000004541405
> [ 789.490922] x23: ffff000004bea950 x22: ffff00000042e400 x21: ffff000007123e30
> [ 789.490935] x20: ffff000007123000 x19: ffff000007a80d50 x18: fffffffffffe7768
> [ 789.490948] x17: 74736574202c6e6f x16: 697461746e656d65 x15: ffff800081b269f0
> [ 789.490962] x14: 0000000000000030 x13: ffff800081b26a70 x12: 0000000000000211
> [ 789.490975] x11: 00000000000000c0 x10: 0000000000000b50 x9 : ffff8000822cbb30
> [ 789.490988] x8 : ffff0000014e7bb0 x7 : ffff00007725e780 x6 : 0000000372a05f49
> [ 789.491001] x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000010
> [ 789.491013] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000014e7000
> [ 789.491027] Call trace:
> [ 789.491032] refcount_warn_saturate+0xf4/0x144 (P)
> [ 789.491043] drm_sched_entity_fini+0x164/0x18c [gpu_sched]
> [ 789.491081] pvr_queue_destroy+0x64/0x134 [powervr]
> [ 789.491110] pvr_context_destroy_queues+0x34/0x64 [powervr]
> [ 789.491138] pvr_context_release+0x70/0xac [powervr]
> [ 789.491166] pvr_context_put.part.0+0x5c/0x7c [powervr]
> [ 789.491193] pvr_context_put+0x14/0x24 [powervr]
> [ 789.491221] pvr_queue_fence_release_work+0x20/0x38 [powervr]
> [ 789.491249] process_one_work+0x160/0x4c4
> [ 789.491264] worker_thread+0x188/0x310
> [ 789.491276] kthread+0x130/0x13c
> [ 789.491287] ret_from_fork+0x10/0x20
> [ 789.491300] ---[ end trace 0000000000000000 ]---
>
> Signed-off-by: Brajesh Gupta <brajesh.gupta@xxxxxxxxxx>
> Fixes: eaf01ee5ba28 ("drm/imagination: Implement job submission and scheduling")
> References: fd177135f0e6 ("drm/sched: Account entity GPU time")
It is probably best to backport this patch (or at least get the description in
the correct shape), but to avoid any issues with tooling that might detect the
commit from this trailer as missing (too new), could you drop this tag, then add
the stable tag? Also the Signed-off-by should come last.
> ---
> Changes in v4:
> - Simplify logic in v3 by pushing new flag to pvr_queue_destroy().
> - Link to v3: https://lore.kernel.org/r/20260611-b4-sched_fix-v3-1-693beb50ea01@xxxxxxxxxx
>
> Changes in v3:
> - Fixed a typo.
> - Handled missing memory leak for RENDER_CONTEXT.
> - Link to v2: https://lore.kernel.org/r/20260611-b4-sched_fix-v2-1-17a93be86fcd@xxxxxxxxxx
>
> Changes in v2:
> - Fixed memory leak identified in following error path handling of pvr_context_create():
> - pvr_context_create()
> - ...
> - err_destroy_queues:
> - pvr_context_destroy_queues()
> - pvr_queue_destroy()
> - Link to v1: https://lore.kernel.org/r/20260610-b4-sched_fix-v1-1-c5977a6e0b4c@xxxxxxxxxx
> ---
> drivers/gpu/drm/imagination/pvr_context.c | 18 ++++++++++--------
> drivers/gpu/drm/imagination/pvr_queue.c | 7 +++++--
> drivers/gpu/drm/imagination/pvr_queue.h | 2 +-
> 3 files changed, 16 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/imagination/pvr_context.c b/drivers/gpu/drm/imagination/pvr_context.c
> index eba4694400b5..74c8aad7fe64 100644
> --- a/drivers/gpu/drm/imagination/pvr_context.c
> +++ b/drivers/gpu/drm/imagination/pvr_context.c
> @@ -161,22 +161,24 @@ ctx_fw_data_init(void *cpu_ptr, void *priv)
> /**
> * pvr_context_destroy_queues() - Destroy all queues attached to a context.
> * @ctx: Context to destroy queues on.
> + * @queue_entity_fini: Corresponding queues of context to be released for
> + * failure path in context creation.
nit: maybe this could be a bit more generic given the latest version of the code
- something like "Whether to cleanup the queue entity e.g. in failure path of
context creation"? Similar in pvr_queue_destroy() below, minus the "e.g. [...]"
part.
Thanks,
Alessio
> *
> * Should be called when the last reference to a context object is dropped.
> * It releases all resources attached to the queues bound to this context.
> */
> -static void pvr_context_destroy_queues(struct pvr_context *ctx)
> +static void pvr_context_destroy_queues(struct pvr_context *ctx, bool queue_entity_fini)
> {
> switch (ctx->type) {
> case DRM_PVR_CTX_TYPE_RENDER:
> - pvr_queue_destroy(ctx->queues.fragment);
> - pvr_queue_destroy(ctx->queues.geometry);
> + pvr_queue_destroy(ctx->queues.fragment, queue_entity_fini);
> + pvr_queue_destroy(ctx->queues.geometry, queue_entity_fini);
> break;
> case DRM_PVR_CTX_TYPE_COMPUTE:
> - pvr_queue_destroy(ctx->queues.compute);
> + pvr_queue_destroy(ctx->queues.compute, queue_entity_fini);
> break;
> case DRM_PVR_CTX_TYPE_TRANSFER_FRAG:
> - pvr_queue_destroy(ctx->queues.transfer);
> + pvr_queue_destroy(ctx->queues.transfer, queue_entity_fini);
> break;
> }
> }
> @@ -240,7 +242,7 @@ static int pvr_context_create_queues(struct pvr_context *ctx,
> return -EINVAL;
>
> err_destroy_queues:
> - pvr_context_destroy_queues(ctx);
> + pvr_context_destroy_queues(ctx, true);
> return err;
> }
>
> @@ -349,7 +351,7 @@ int pvr_context_create(struct pvr_file *pvr_file, struct drm_pvr_ioctl_create_co
> pvr_fw_object_destroy(ctx->fw_obj);
>
> err_destroy_queues:
> - pvr_context_destroy_queues(ctx);
> + pvr_context_destroy_queues(ctx, true);
>
> err_free_ctx_id:
> /*
> @@ -384,7 +386,7 @@ pvr_context_release(struct kref *ref_count)
> spin_unlock(&pvr_dev->ctx_list_lock);
>
> xa_erase(&pvr_dev->ctx_ids, ctx->ctx_id);
> - pvr_context_destroy_queues(ctx);
> + pvr_context_destroy_queues(ctx, false);
> pvr_fw_object_destroy(ctx->fw_obj);
> kfree(ctx->data);
> pvr_vm_context_put(ctx->vm_ctx);
> diff --git a/drivers/gpu/drm/imagination/pvr_queue.c b/drivers/gpu/drm/imagination/pvr_queue.c
> index 7ed60e1c1a86..324bdadc3d71 100644
> --- a/drivers/gpu/drm/imagination/pvr_queue.c
> +++ b/drivers/gpu/drm/imagination/pvr_queue.c
> @@ -1439,11 +1439,13 @@ void pvr_queue_kill(struct pvr_queue *queue)
> /**
> * pvr_queue_destroy() - Destroy a queue.
> * @queue: The queue to destroy.
> + * @queue_entity_fini: Corresponding queues of context to be released for
> + * failure path in context creation.
> *
> * Cleanup the queue and free the resources attached to it. Should be
> * called from the context release function.
> */
> -void pvr_queue_destroy(struct pvr_queue *queue)
> +void pvr_queue_destroy(struct pvr_queue *queue, bool queue_entity_fini)
> {
> if (!queue)
> return;
> @@ -1453,7 +1455,8 @@ void pvr_queue_destroy(struct pvr_queue *queue)
> mutex_unlock(&queue->ctx->pvr_dev->queues.lock);
>
> drm_sched_fini(&queue->scheduler);
> - drm_sched_entity_fini(&queue->entity);
> + if (queue_entity_fini)
> + drm_sched_entity_fini(&queue->entity);
>
> if (WARN_ON(queue->last_queued_job_scheduled_fence))
> dma_fence_put(queue->last_queued_job_scheduled_fence);
> diff --git a/drivers/gpu/drm/imagination/pvr_queue.h b/drivers/gpu/drm/imagination/pvr_queue.h
> index 4aa72665ce25..5b07d7edf1c1 100644
> --- a/drivers/gpu/drm/imagination/pvr_queue.h
> +++ b/drivers/gpu/drm/imagination/pvr_queue.h
> @@ -158,7 +158,7 @@ struct pvr_queue *pvr_queue_create(struct pvr_context *ctx,
>
> void pvr_queue_kill(struct pvr_queue *queue);
>
> -void pvr_queue_destroy(struct pvr_queue *queue);
> +void pvr_queue_destroy(struct pvr_queue *queue, bool queue_entity_fini);
>
> void pvr_queue_process(struct pvr_queue *queue);
>
>
> ---
> base-commit: 61de054a772a1feda6364931ab1baf9038abf1c8
> change-id: 20260610-b4-sched_fix-ac3b920f475b
>
> Best regards,