Re: [RFC PATCH 3/3] drm/sched: Prevent adding dependencies to an armed job
From: Philipp Stanner
Date: Mon Oct 27 2025 - 07:15:41 EST
I've got a kernel.org addr by now by the way
On Tue, 2025-10-21 at 14:39 -0700, Matthew Brost wrote:
> According to the DMA scheduler documentation, once a job is armed, it
> must be pushed. Drivers should avoid calling the failing code path that
> attempts to add dependencies after a job has been armed.
>
Why is that a "failing code path"?
The issue with adding callbacks is that adding them to an already
signaled fence is a bad idea. I'm not sure if it's illegal, though.
dma_fence_add_cb() merely returns an error then, but the driver could
in priniciple then execute its cb code itself.
And even if we agree that this is a hard rule that must be followed,
then drm_sched_job_arm() *might* not be the right place, because just
because a job is armed doesn't mean that its fence is about to get
signaled. drm_sched_entity_push_job() would be the critical place.
> This change
> enforces that rule.
>
> Cc: Christian König <christian.koenig@xxxxxxx>
> Cc: Danilo Krummrich <dakr@xxxxxxxxxx>
> Cc: Matthew Brost <matthew.brost@xxxxxxxxx>
> Cc: Philipp Stanner <phasta@xxxxxxxxxx>
> Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx
> Signed-off-by: Matthew Brost <matthew.brost@xxxxxxxxx>
> ---
> drivers/gpu/drm/scheduler/sched_main.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 676484dd3ea3..436cb2844161 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -873,7 +873,8 @@ EXPORT_SYMBOL(drm_sched_job_arm);
> * @job: scheduler job to add the dependencies to
> * @fence: the dma_fence to add to the list of dependencies.
> *
> - * Note that @fence is consumed in both the success and error cases.
> + * Note that @fence is consumed in both the success and error cases. This
> + * function cannot be called if the job is armed.
> *
> * Returns:
> * 0 on success, or an error on failing to expand the array.
> @@ -886,6 +887,10 @@ int drm_sched_job_add_dependency(struct drm_sched_job *job,
> u32 id = 0;
> int ret;
>
> + /* Do not allow additional dependencies when job is armed */
> + if (WARN_ON_ONCE(job->sched))
One would probably want an 'armed' boolean for that. At the very least
one wants to document in the struct's docstring that job->sched has
this semantic meaning. Otherwise it's only obvious for people who have
been hacking on the scheduler for years.
By the way I think that we use WARN_ON*() too much in DRM. It generates
difficult to read, non-descriptive error messages compared to
dev_warn() and similar helpers, and it's often a bit overkill. I would
only use it when there is no other choice, such as in an interrupt-
handler or widely used void func() where you cannot simply add a return
code.
P.
> + return -EINVAL;
> +
> if (!fence)
> return 0;
>