Re: [PATCH] drm: Don't free jobs in wait_event_interruptible()

From: Steven Price
Date: Thu Sep 26 2019 - 05:41:40 EST


On 25/09/2019 21:09, Grodzovsky, Andrey wrote:
>
> On 9/25/19 12:07 PM, Andrey Grodzovsky wrote:
>> On 9/25/19 12:00 PM, Steven Price wrote:
>>
>>> On 25/09/2019 16:56, Grodzovsky, Andrey wrote:
>>>> On 9/25/19 11:14 AM, Steven Price wrote:
>>>>
>>>>> drm_sched_cleanup_jobs() attempts to free finished jobs, however
>>>>> because
>>>>> it is called as the condition of wait_event_interruptible() it must
>>>>> not
>>>>> sleep. Unfortunately some free callbacks (notably for Panfrost) do
>>>>> sleep.
>>>>>
>>>>> Instead let's rename drm_sched_cleanup_jobs() to
>>>>> drm_sched_get_cleanup_job() and simply return a job for processing if
>>>>> there is one. The caller can then call the free_job() callback outside
>>>>> the wait_event_interruptible() where sleeping is possible before
>>>>> re-checking and returning to sleep if necessary.
>>>>>
>>>>> Signed-off-by: Steven Price <steven.price@xxxxxxx>
>>>>> ---
>>>>> ÂÂ drivers/gpu/drm/scheduler/sched_main.c | 44
>>>>> ++++++++++++++------------
>>>>> ÂÂ 1 file changed, 24 insertions(+), 20 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> index 9a0ee74d82dc..0ed4aaa4e6d1 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> @@ -622,43 +622,41 @@ static void drm_sched_process_job(struct
>>>>> dma_fence *f, struct dma_fence_cb *cb)
>>>>> ÂÂ }
>>>>> ÂÂ ÂÂ /**
>>>>> - * drm_sched_cleanup_jobs - destroy finished jobs
>>>>> + * drm_sched_get_cleanup_job - fetch the next finished job to be
>>>>> destroyed
>>>>> ÂÂÂ *
>>>>> ÂÂÂ * @sched: scheduler instance
>>>>> ÂÂÂ *
>>>>> - * Remove all finished jobs from the mirror list and destroy them.
>>>>> + * Returns the next finished job from the mirror list (if there is
>>>>> one)
>>>>> + * ready for it to be destroyed.
>>>>> ÂÂÂ */
>>>>> -static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
>>>>> +static struct drm_sched_job *
>>>>> +drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>>>>> ÂÂ {
>>>>> +ÂÂÂ struct drm_sched_job *job = NULL;
>>>>> ÂÂÂÂÂÂ unsigned long flags;
>>>>> ÂÂ ÂÂÂÂÂÂ /* Don't destroy jobs while the timeout worker is running */
>>>>> ÂÂÂÂÂÂ if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>>>>> ÂÂÂÂÂÂÂÂÂÂ !cancel_delayed_work(&sched->work_tdr))
>>>>> -ÂÂÂÂÂÂÂ return;
>>>>> -
>>>>> -
>>>>> -ÂÂÂ while (!list_empty(&sched->ring_mirror_list)) {
>>>>> -ÂÂÂÂÂÂÂ struct drm_sched_job *job;
>>>>> +ÂÂÂÂÂÂÂ return NULL;
>>>>> ÂÂ -ÂÂÂÂÂÂÂ job = list_first_entry(&sched->ring_mirror_list,
>>>>> +ÂÂÂ job = list_first_entry_or_null(&sched->ring_mirror_list,
>>>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ struct drm_sched_job, node);
>>>>> -ÂÂÂÂÂÂÂ if (!dma_fence_is_signaled(&job->s_fence->finished))
>>>>> -ÂÂÂÂÂÂÂÂÂÂÂ break;
>>>>> ÂÂ -ÂÂÂÂÂÂÂ spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>> +ÂÂÂ spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>> +
>>>>> +ÂÂÂ if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
>>>>> ÂÂÂÂÂÂÂÂÂÂ /* remove job from ring_mirror_list */
>>>>> ÂÂÂÂÂÂÂÂÂÂ list_del_init(&job->node);
>>>>> - spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>> -
>>>>> -ÂÂÂÂÂÂÂ sched->ops->free_job(job);
>>>>> +ÂÂÂ } else {
>>>>> +ÂÂÂÂÂÂÂ job = NULL;
>>>>> +ÂÂÂÂÂÂÂ /* queue timeout for next job */
>>>>> +ÂÂÂÂÂÂÂ drm_sched_start_timeout(sched);
>>>>> ÂÂÂÂÂÂ }
>>>>> ÂÂ -ÂÂÂ /* queue timeout for next job */
>>>>> -ÂÂÂ spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>> -ÂÂÂ drm_sched_start_timeout(sched);
>>>>> ÂÂÂÂÂÂ spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>> ÂÂ +ÂÂÂ return job;
>>>>> ÂÂ }
>>>>> ÂÂ ÂÂ /**
>>>>> @@ -698,12 +696,18 @@ static int drm_sched_main(void *param)
>>>>> ÂÂÂÂÂÂÂÂÂÂ struct drm_sched_fence *s_fence;
>>>>> ÂÂÂÂÂÂÂÂÂÂ struct drm_sched_job *sched_job;
>>>>> ÂÂÂÂÂÂÂÂÂÂ struct dma_fence *fence;
>>>>> +ÂÂÂÂÂÂÂ struct drm_sched_job *cleanup_job = NULL;
>>>>> wait_event_interruptible(sched->wake_up_worker,
>>>>> -ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ (drm_sched_cleanup_jobs(sched),
>>>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ (cleanup_job =
>>>>> drm_sched_get_cleanup_job(sched)) ||
>>>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ (!drm_sched_blocked(sched) &&
>>>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ (entity = drm_sched_select_entity(sched))) ||
>>>>> -ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ kthread_should_stop()));
>>>>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ kthread_should_stop());
>>>>
>>>> Can't we just call drm_sched_cleanup_jobs right here, remove all the
>>>> conditions in wait_event_interruptible (make it always true) and after
>>>> drm_sched_cleanup_jobs is called test for all those conditions and
>>>> return to sleep if they evaluate to false ? drm_sched_cleanup_jobs is
>>>> called unconditionally inside wait_event_interruptible anyway...
>>>> This is
>>>> more of a question to Christian.
>>> Christian may know better than me, but I think those conditions need to
>>> be in wait_event_interruptible() to avoid race conditions. If we simply
>>> replace all the conditions with a literal "true" then
>>> wait_event_interruptible() will never actually sleep.
>>>
>>> Steve
>>
>> Yes you right, it won't work as I missed that condition is evaluated
>> as first step in wait_event_interruptible before it sleeps.
>>
>> Andrey
>
> Another idea - what about still just relocating drm_sched_cleanup_jobs
> to after wait_event_interruptible and also call it in drm_sched_fini soÂ
> for the case when it will not be called from drm_sched_main due to
> conditions not evaluating to true it eventually be called last time
> from drm_sched_fini. I mean - the refactor looks ok to me but the code
> becomes somewhat confusing this way to grasp.
>
> Andrey

That sounds similar to my first stab at this[1]. However Christian
pointed out that it is necessary to also free jobs even if there isn't a
new one to be scheduled. Otherwise it ends up with the jobs lying around
until something kicks it.

There is also the aspect of queueing the timeout for the next job - this
is the part that I don't actually understand, but removing it from the
wait_event_interruptible() invariable seems to cause problems. Hence
this approach which avoids changing this behaviour. But I welcome input
from anyone who understands this timeout mechanism!

Steve

[1]
https://lists.freedesktop.org/archives/dri-devel/2019-September/235346.html