[PATCH 2/2] drm/sched: Add FIXME detailing potential hang

From: Philipp Stanner

Date: Tue Oct 28 2025 - 09:46:20 EST


If a job from a ready entity needs more credits than are currently
available, drm_sched_run_job_work() (a work item) simply returns and
doesn't reschedule itself. The scheduler is only woken up again when the
next job gets pushed with drm_sched_entity_push_job().

If someone submits a job that needs too many credits and doesn't submit
more jobs afterwards, this would lead to the scheduler never pulling the
too-expensive job, effectively hanging forever.

Document this problem as a FIXME.

Signed-off-by: Philipp Stanner <phasta@xxxxxxxxxx>
---
drivers/gpu/drm/scheduler/sched_main.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 492e8af639db..eaf8d17b2a66 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1237,6 +1237,16 @@ static void drm_sched_run_job_work(struct work_struct *w)

/* Find entity with a ready job */
entity = drm_sched_select_entity(sched);
+ /*
+ * FIXME:
+ * The entity can be NULL when the scheduler currently has no capacity
+ * (credits) for more jobs. If that happens, the work item terminates
+ * itself here, without rescheduling itself.
+ *
+ * It only gets started again in drm_sched_entity_push_job(). IOW, the
+ * scheduler might hang forever if a job that needs too many credits
+ * gets submitted to an entity and no other, subsequent jobs are.
+ */
if (!entity) {
/*
* Either no more work to do, or the next ready job needs more
--
2.49.0