Re: [PATCH] drm: Don't free jobs in wait_event_interruptible()

From: Grodzovsky, Andrey
Date: Thu Sep 26 2019 - 10:31:29 EST



On 9/26/19 5:41 AM, Steven Price wrote:
> On 25/09/2019 21:09, Grodzovsky, Andrey wrote:
>> On 9/25/19 12:07 PM, Andrey Grodzovsky wrote:
>>> On 9/25/19 12:00 PM, Steven Price wrote:
>>>
>>>> On 25/09/2019 16:56, Grodzovsky, Andrey wrote:
>>>>> On 9/25/19 11:14 AM, Steven Price wrote:
>>>>>
>>>>>> drm_sched_cleanup_jobs() attempts to free finished jobs, however
>>>>>> because
>>>>>> it is called as the condition of wait_event_interruptible() it must
>>>>>> not
>>>>>> sleep. Unfortunately some free callbacks (notably for Panfrost) do
>>>>>> sleep.
>>>>>>
>>>>>> Instead let's rename drm_sched_cleanup_jobs() to
>>>>>> drm_sched_get_cleanup_job() and simply return a job for processing if
>>>>>> there is one. The caller can then call the free_job() callback outside
>>>>>> the wait_event_interruptible() where sleeping is possible before
>>>>>> re-checking and returning to sleep if necessary.
>>>>>>
>>>>>> Signed-off-by: Steven Price <steven.price@xxxxxxx>
>>>>>> ---
>>>>>> ÂÂ drivers/gpu/drm/scheduler/sched_main.c | 44
>>>>>> ++++++++++++++------------
>>>>>> ÂÂ 1 file changed, 24 insertions(+), 20 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> index 9a0ee74d82dc..0ed4aaa4e6d1 100644
>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> @@ -622,43 +622,41 @@ static void drm_sched_process_job(struct
>>>>>> dma_fence *f, struct dma_fence_cb *cb)
>>>>>> ÂÂ }
>>>>>> ÂÂ ÂÂ /**
>>>>>> - * drm_sched_cleanup_jobs - destroy finished jobs
>>>>>> + * drm_sched_get_cleanup_job - fetch the next finished job to be
>>>>>> destroyed
>>>>>> ÂÂÂ *
>>>>>> ÂÂÂ * @sched: scheduler instance
>>>>>> ÂÂÂ *
>>>>>> - * Remove all finished jobs from the mirror list and destroy them.
>>>>>> + * Returns the next finished job from the mirror list (if there is
>>>>>> one)
>>>>>> + * ready for it to be destroyed.
>>>>>> ÂÂÂ */
>>>>>> -static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
>>>>>> +static struct drm_sched_job *
>>>>>> +drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>>>>>> ÂÂ {
>>>>>> +ÂÂÂ struct drm_sched_job *job = NULL;
>>>>>> ÂÂÂÂÂÂ unsigned long flags;
>>>>>> ÂÂ ÂÂÂÂÂÂ /* Don't destroy jobs while the timeout worker is running */
>>>>>> ÂÂÂÂÂÂ if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>>>>>> ÂÂÂÂÂÂÂÂÂÂ !cancel_delayed_work(&sched->work_tdr))
>>>>>> -ÂÂÂÂÂÂÂ return;
>>>>>> -
>>>>>> -
>>>>>> -ÂÂÂ while (!list_empty(&sched->ring_mirror_list)) {
>>>>>> -ÂÂÂÂÂÂÂ struct drm_sched_job *job;
>>>>>> +ÂÂÂÂÂÂÂ return NULL;
>>>>>> ÂÂ -ÂÂÂÂÂÂÂ job = list_first_entry(&sched->ring_mirror_list,
>>>>>> +ÂÂÂ job = list_first_entry_or_null(&sched->ring_mirror_list,
>>>>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ struct drm_sched_job, node);
>>>>>> -ÂÂÂÂÂÂÂ if (!dma_fence_is_signaled(&job->s_fence->finished))
>>>>>> -ÂÂÂÂÂÂÂÂÂÂÂ break;
>>>>>> ÂÂ -ÂÂÂÂÂÂÂ spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>> +ÂÂÂ spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>> +
>>>>>> +ÂÂÂ if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
>>>>>> ÂÂÂÂÂÂÂÂÂÂ /* remove job from ring_mirror_list */
>>>>>> ÂÂÂÂÂÂÂÂÂÂ list_del_init(&job->node);
>>>>>> - spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>> -
>>>>>> -ÂÂÂÂÂÂÂ sched->ops->free_job(job);
>>>>>> +ÂÂÂ } else {
>>>>>> +ÂÂÂÂÂÂÂ job = NULL;
>>>>>> +ÂÂÂÂÂÂÂ /* queue timeout for next job */
>>>>>> +ÂÂÂÂÂÂÂ drm_sched_start_timeout(sched);
>>>>>> ÂÂÂÂÂÂ }
>>>>>> ÂÂ -ÂÂÂ /* queue timeout for next job */
>>>>>> -ÂÂÂ spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>> -ÂÂÂ drm_sched_start_timeout(sched);
>>>>>> ÂÂÂÂÂÂ spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>> ÂÂ +ÂÂÂ return job;
>>>>>> ÂÂ }
>>>>>> ÂÂ ÂÂ /**
>>>>>> @@ -698,12 +696,18 @@ static int drm_sched_main(void *param)
>>>>>> ÂÂÂÂÂÂÂÂÂÂ struct drm_sched_fence *s_fence;
>>>>>> ÂÂÂÂÂÂÂÂÂÂ struct drm_sched_job *sched_job;
>>>>>> ÂÂÂÂÂÂÂÂÂÂ struct dma_fence *fence;
>>>>>> +ÂÂÂÂÂÂÂ struct drm_sched_job *cleanup_job = NULL;
>>>>>> wait_event_interruptible(sched->wake_up_worker,
>>>>>> -ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ (drm_sched_cleanup_jobs(sched),
>>>>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ (cleanup_job =
>>>>>> drm_sched_get_cleanup_job(sched)) ||
>>>>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ (!drm_sched_blocked(sched) &&
>>>>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ (entity = drm_sched_select_entity(sched))) ||
>>>>>> -ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ kthread_should_stop()));
>>>>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ kthread_should_stop());
>>>>> Can't we just call drm_sched_cleanup_jobs right here, remove all the
>>>>> conditions in wait_event_interruptible (make it always true) and after
>>>>> drm_sched_cleanup_jobs is called test for all those conditions and
>>>>> return to sleep if they evaluate to false ? drm_sched_cleanup_jobs is
>>>>> called unconditionally inside wait_event_interruptible anyway...
>>>>> This is
>>>>> more of a question to Christian.
>>>> Christian may know better than me, but I think those conditions need to
>>>> be in wait_event_interruptible() to avoid race conditions. If we simply
>>>> replace all the conditions with a literal "true" then
>>>> wait_event_interruptible() will never actually sleep.
>>>>
>>>> Steve
>>> Yes you right, it won't work as I missed that condition is evaluated
>>> as first step in wait_event_interruptible before it sleeps.
>>>
>>> Andrey
>> Another idea - what about still just relocating drm_sched_cleanup_jobs
>> to after wait_event_interruptible and also call it in drm_sched_fini so
>> for the case when it will not be called from drm_sched_main due to
>> conditions not evaluating to true it eventually be called last time
>> from drm_sched_fini. I mean - the refactor looks ok to me but the code
>> becomes somewhat confusing this way to grasp.
>>
>> Andrey
> That sounds similar to my first stab at this[1]. However Christian
> pointed out that it is necessary to also free jobs even if there isn't a
> new one to be scheduled. Otherwise it ends up with the jobs lying around
> until something kicks it.


But if there is no new job to be scheduled then no one will wake up the
sched->wake_up_worker and the condition in wait_event_interruptible is
reevaluated only when the wq head is waked up.

>
> There is also the aspect of queueing the timeout for the next job - this
> is the part that I don't actually understand, but removing it from the
> wait_event_interruptible() invariable seems to cause problems. Hence
> this approach which avoids changing this behaviour. But I welcome input
> from anyone who understands this timeout mechanism!


Not familiar with this problem as before it was done from
wait_event_interruptible it was done from scheduled work fired from
job's completion interrupt and I don't remember any issues with it.

In either case I think both solutions are legit so Reviewed-by: Andrey
Grodzovsky <andrey.grodzovsky@xxxxxxx>

Andrey


>
> Steve
>
> [1]
> https://lists.freedesktop.org/archives/dri-devel/2019-September/235346.html