Re: [PATCH] drm: Don't free jobs in wait_event_interruptible()

From: Koenig, Christian
Date: Thu Sep 26 2019 - 07:17:17 EST


Am 26.09.19 um 11:47 schrieb Steven Price:
> On 26/09/2019 08:07, Koenig, Christian wrote:
>> Am 25.09.19 um 17:14 schrieb Steven Price:
>>> drm_sched_cleanup_jobs() attempts to free finished jobs, however because
>>> it is called as the condition of wait_event_interruptible() it must not
>>> sleep. Unfortunately some free callbacks (notably for Panfrost) do sleep.
>>>
>>> Instead let's rename drm_sched_cleanup_jobs() to
>>> drm_sched_get_cleanup_job() and simply return a job for processing if
>>> there is one. The caller can then call the free_job() callback outside
>>> the wait_event_interruptible() where sleeping is possible before
>>> re-checking and returning to sleep if necessary.
>>>
>>> Signed-off-by: Steven Price <steven.price@xxxxxxx>
>>> ---
>>> drivers/gpu/drm/scheduler/sched_main.c | 44 ++++++++++++++------------
>>> 1 file changed, 24 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>> index 9a0ee74d82dc..0ed4aaa4e6d1 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -622,43 +622,41 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb)
>>> }
>>>
>>> /**
>>> - * drm_sched_cleanup_jobs - destroy finished jobs
>>> + * drm_sched_get_cleanup_job - fetch the next finished job to be destroyed
>>> *
>>> * @sched: scheduler instance
>>> *
>>> - * Remove all finished jobs from the mirror list and destroy them.
>>> + * Returns the next finished job from the mirror list (if there is one)
>>> + * ready for it to be destroyed.
>>> */
>>> -static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
>>> +static struct drm_sched_job *
>>> +drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>>> {
>>> + struct drm_sched_job *job = NULL;
>>> unsigned long flags;
>>>
>>> /* Don't destroy jobs while the timeout worker is running */
>>> if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>>> !cancel_delayed_work(&sched->work_tdr))
>>> - return;
>>> -
>>> -
>>> - while (!list_empty(&sched->ring_mirror_list)) {
>>> - struct drm_sched_job *job;
>>> + return NULL;
>>>
>>> - job = list_first_entry(&sched->ring_mirror_list,
>>> + job = list_first_entry_or_null(&sched->ring_mirror_list,
>>> struct drm_sched_job, node);
>> This is probably better done after taking the lock, apart from that
>> looks good to me.
> It wasn't previously protected by the lock - but I don't see any harm so
> I'll post a v2.

Calling list_empty without the lock is harmless, but if you return the
entry with list_first_entry_or_null you rather want the memory barrier.

In this particular case we could actually get away with it because the
only other actor removing entries is the timeout handling and that was
canceled a few lines above.

But better save than sorry should anybody decide to modify the code again.

Regards,
Christian.

>
> Steve