Re: [PATCH v1] drm/scheduler: Don't kill jobs in interrupt context

From: Dmitry Osipenko
Date: Tue Apr 12 2022 - 14:20:41 EST


On 4/12/22 19:51, Andrey Grodzovsky wrote:
>
> On 2022-04-11 18:15, Dmitry Osipenko wrote:
>> Interrupt context can't sleep. Drivers like Panfrost and MSM are taking
>> mutex when job is released, and thus, that code can sleep. This results
>> into "BUG: scheduling while atomic" if locks are contented while job is
>> freed. There is no good reason for releasing scheduler's jobs in IRQ
>> context, hence use normal context to fix the trouble.
>
>
> I am not sure this is the beast Idea to leave job's sw fence signalling
> to be
> executed in system_wq context which is prone to delays of executing
> various work items from around the system. Seems better to me to leave the
> fence signaling within the IRQ context and offload only the job freeing or,
> maybe handle rescheduling to thread context within drivers implemention
> of .free_job cb. Not really sure which is the better.

We're talking here about killing jobs when driver destroys context,
which doesn't feel like it needs to be a fast path. I could move the
signalling into drm_sched_entity_kill_jobs_cb() and use unbound wq, but
do we really need this for a slow path?