Re: [PATCH] drm/scheduler: set current_entity to next when remove from rq

From: Luben Tuikov
Date: Thu Oct 27 2022 - 04:24:47 EST


On 2022-10-27 04:19, Christian König wrote:
> Am 27.10.22 um 10:07 schrieb Luben Tuikov:
>> On 2022-10-27 03:01, Luben Tuikov wrote:
>>> On 2022-10-25 13:50, Luben Tuikov wrote:
>>>> Looking...
>>>>
>>>> Regards,
>>>> Luben
>>>>
>>>> On 2022-10-25 09:35, Alex Deucher wrote:
>>>>> + Luben
>>>>>
>>>>> On Tue, Oct 25, 2022 at 2:55 AM brolerliew <brolerliew@xxxxxxxxx> wrote:
>>>>>> When entity move from one rq to another, current_entity will be set to NULL
>>>>>> if it is the moving entity. This make entities close to rq head got
>>>>>> selected more frequently, especially when doing load balance between
>>>>>> multiple drm_gpu_scheduler.
>>>>>>
>>>>>> Make current_entity to next when removing from rq.
>>>>>>
>>>>>> Signed-off-by: brolerliew <brolerliew@xxxxxxxxx>
>>>>>> ---
>>>>>> drivers/gpu/drm/scheduler/sched_main.c | 5 +++--
>>>>>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> index 2fab218d7082..00b22cc50f08 100644
>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> @@ -168,10 +168,11 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
>>>>>> spin_lock(&rq->lock);
>>>>>>
>>>>>> atomic_dec(rq->sched->score);
>>>>>> - list_del_init(&entity->list);
>>>>>>
>>>>>> if (rq->current_entity == entity)
>>>>>> - rq->current_entity = NULL;
>>>>>> + rq->current_entity = list_next_entry(entity, list);
>>>>>> +
>>>>>> + list_del_init(&entity->list);
>>>>>>
>>>>>> if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
>>>>>> drm_sched_rq_remove_fifo_locked(entity);
>>>>>> --
>>>>>> 2.34.1
>>>>>>
>>> Looks good. I'll pick it up into some other changes I've in tow, and repost
>>> along with my changes, as they're somewhat related.
>> Actually, the more I look at it, the more I think that we do want to set
>> rq->current_entity to NULL in that function, in order to pick the next best entity
>> (or scheduler for that matter), the next time around. See sched_entity.c,
>> and drm_sched_rq_select_entity() where we start evaluating from the _next_
>> entity.
>>
>> So, it is best to leave it to set it to NULL, for now.
>
> Apart from that this patch here could cause a crash when the entity is
> the last one in the list.
>
> In this case current current_entity would be set to an incorrect upcast
> of the head of the list.

Absolutely. I saw that, but in rejecting the patch, I didn't feel the need to mention it.

Thanks for looking into this.

Regards,
Luben