Re: [PATCH v2] drm/sched: Protect entity->last_scheduled with spinlock

From: Tvrtko Ursulin

Date: Tue Jun 30 2026 - 05:37:39 EST

On 26/06/2026 09:19, Philipp Stanner wrote:

The entity->last_scheduled field has always been set and read with
special RCU functions in addition to memory barriers. There is no
obvious reason for that, since the entity lock is available and taken at
all places that evaluate the last_scheduled field. The only exception is
drm_sched_entity_error(), which is not performance critical in any way.

I agree this looks odd since all call sites apart from drm_sched_entity_error() use "rcu_dereference_check(entity->last_scheduled, true);" ie. "ignore" the RCU.

Btw this was added in:

commit 70102d77ff22dd88a0111b1c3bac5099ac5d0425
Author: Christian König <christian.koenig@xxxxxxx>
Date: Mon Apr 17 17:32:11 2023 +0200

drm/scheduler: add drm_sched_entity_error and use rcu for last_scheduled

You may want to add this as a reference in the commit message.

I guess it relied on dma-fence RCU destruction to enable lockless lookups from the AMD submit path. Given how many other locks we have in those paths it is probably noise to have one more so maybe it is a win to remove some barriers and those rcu_dereference_check-true lines. I think Christian will need to comment.

Improve robustness, readability and maintainability by replacing RCU and
barriers with the lock.

As a preparational step, while at it, also guard spsc_queue_pop() with
the lock, since spsc_queue is deprecated and supposed to be replaced
with a locked list.

You would have said to split the logical changes into separate patches.

Signed-off-by: Philipp Stanner <phasta@xxxxxxxxxx>
---
Changes since v1:
- Add a helper variable to drop the last_scheduled reference without
the entity lock being held; just to be more robust.
- Write additional comment to detail the WRITE_ONCE().
---
drivers/gpu/drm/scheduler/sched_entity.c | 58 +++++++++++++-----------
include/drm/gpu_scheduler.h | 9 ++--
2 files changed, 35 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index c51101ec70c1..12fd695c6d46 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -135,7 +135,6 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
entity->num_sched_list = num_sched_list;
entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
entity->rq = &sched_list[0]->rq;
- RCU_INIT_POINTER(entity->last_scheduled, NULL);
RB_CLEAR_NODE(&entity->rb_tree_node);
init_completion(&entity->entity_idle);
@@ -201,10 +200,10 @@ int drm_sched_entity_error(struct drm_sched_entity *entity)
struct dma_fence *fence;
int r;
- rcu_read_lock();
- fence = rcu_dereference(entity->last_scheduled);
+ spin_lock(&entity->lock);
+ fence = entity->last_scheduled;
r = fence ? fence->error : 0;
- rcu_read_unlock();
+ spin_unlock(&entity->lock);
return r;
}
@@ -288,8 +287,10 @@ void drm_sched_entity_kill(struct drm_sched_entity *entity)
wait_for_completion(&entity->entity_idle);
/* The entity is guaranteed to not be used by the scheduler */
- prev = rcu_dereference_check(entity->last_scheduled, true);
+ spin_lock(&entity->lock);
+ prev = entity->last_scheduled;
dma_fence_get(prev);
+ spin_unlock(&entity->lock);
while ((job = drm_sched_entity_queue_pop(entity))) {
struct drm_sched_fence *s_fence = job->s_fence;
@@ -381,8 +382,12 @@ void drm_sched_entity_fini(struct drm_sched_entity *entity)
entity->dependency = NULL;
}
- dma_fence_put(rcu_dereference_check(entity->last_scheduled, true));
- RCU_INIT_POINTER(entity->last_scheduled, NULL);
+ dma_fence_put(entity->last_scheduled);
+ /*
+ * Normally all users should be gone now, but since drm_sched has
+ * experienced many layering violations in the past, better be safe.
+ */
+ WRITE_ONCE(entity->last_scheduled, NULL);
drm_sched_entity_stats_put(entity->stats);
}
EXPORT_SYMBOL(drm_sched_entity_fini);
@@ -507,6 +512,10 @@ drm_sched_job_dependency(struct drm_sched_job *job,
struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
{
+ /* Helper to avoid dropping the reference while the entity lock is held,
+ * just to have some more robustness.
+ */
+ struct dma_fence *prev_last_scheduled;
struct drm_sched_job *sched_job;
sched_job = drm_sched_entity_queue_peek(entity);
@@ -523,19 +532,20 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
if (entity->guilty && atomic_read(entity->guilty))
dma_fence_set_error(&sched_job->s_fence->finished, -ECANCELED);
- dma_fence_put(rcu_dereference_check(entity->last_scheduled, true));
- rcu_assign_pointer(entity->last_scheduled,
- dma_fence_get(&sched_job->s_fence->finished));
+ spin_lock(&entity->lock);
+ prev_last_scheduled = entity->last_scheduled;
+ entity->last_scheduled = dma_fence_get(&sched_job->s_fence->finished);
- /*
- * If the queue is empty we allow drm_sched_entity_select_rq() to
- * locklessly access ->last_scheduled. This only works if we set the
- * pointer before we dequeue and if we a write barrier here.
+ /* A recent rework required taking the spinlock above. Since spsc_queue
+ * is scheduled for removal as per the DRM-TODO-list, we access it here
+ * locked already to prepare for that cleanup.
+ *
+ * TODO: Fully replace spsc_queue with a locked (h)list.
*/
- smp_wmb();
-
spsc_queue_pop(&entity->job_queue);
+ spin_unlock(&entity->lock);
+ dma_fence_put(prev_last_scheduled);
drm_sched_rq_pop_entity(entity);

Notice the entity->lock ends up cycled twice for no good reason (second is in drm_sched_rq_pop_entity()). So I would suggest you somehow reduce that to once. Probably just pull out entity->lock out of the drm_sched_rq_pop_entity() to drm_sched_entity_pop_job()?

I guess if you do that then the "while at it" part of the commit message can be "upgraded" to "spsc_queue_pop() being under the lock as a consequence of the rework" and then no need to split it.

/* Jobs and entities might have different lifecycles. Since we're
@@ -561,21 +571,15 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
if (spsc_queue_count(&entity->job_queue))
return;
- /*
- * Only when the queue is empty are we guaranteed that
- * drm_sched_run_job_work() cannot change entity->last_scheduled. To
- * enforce ordering we need a read barrier here. See
- * drm_sched_entity_pop_job() for the other side.
- */
- smp_rmb();
-
- fence = rcu_dereference_check(entity->last_scheduled, true);
+ spin_lock(&entity->lock);
+ fence = entity->last_scheduled;
/* stay on the same engine if the previous job hasn't finished */
- if (fence && !dma_fence_is_signaled(fence))
+ if (fence && !dma_fence_is_signaled(fence)) {
+ spin_unlock(&entity->lock);

Have you tried with lockdep to see if there are any hidden lock inversions with this?

I also wonder if we could demote this to a flag check only and remove any doubt. I don't think opportunistic signalling matter in this code path.

Regards,

Tvrtko

return;
+ }
- spin_lock(&entity->lock);
sched = drm_sched_pick_best(entity->sched_list, entity->num_sched_list);
rq = sched ? &sched->rq : NULL;
if (rq != entity->rq) {
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index d61c19e78182..176ff1f936cd 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -100,7 +100,8 @@ struct drm_sched_entity {
* @lock:
*
* Lock protecting the run-queue (@rq) to which this entity belongs,
- * @priority and the list of schedulers (@sched_list, @num_sched_list).
+ * @priority, @last_scheduled and the list of schedulers (@sched_list,
+ * @num_sched_list).
*/
spinlock_t lock;
@@ -202,11 +203,9 @@ struct drm_sched_entity {
/**
* @last_scheduled:
*
- * Points to the finished fence of the last scheduled job. Only written
- * by drm_sched_entity_pop_job(). Can be accessed locklessly from
- * drm_sched_job_arm() if the queue is empty.
+ * Points to the finished fence of the last scheduled job.
*/
- struct dma_fence __rcu *last_scheduled;
+ struct dma_fence *last_scheduled;
/**
* @last_user: last group leader pushing a job into the entity.