Re: [PATCH] sched/fair: reduce preemption with IDLE tasks runable(Internet mail)

From: Dietmar Eggemann
Date: Thu Aug 06 2020 - 13:22:38 EST


On 03/08/2020 13:26, benbjiang(蒋彪) wrote:
>
>
>> On Aug 3, 2020, at 4:16 PM, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
>>
>> On 01/08/2020 04:32, Jiang Biao wrote:
>>> From: Jiang Biao <benbjiang@xxxxxxxxxxx>
>>>
>>> No need to preempt when there are only one runable CFS task with
>>> other IDLE tasks on runqueue. The only one CFS task would always
>>> be picked in that case.
>>>
>>> Signed-off-by: Jiang Biao <benbjiang@xxxxxxxxxxx>
>>> ---
>>> kernel/sched/fair.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 04fa8dbcfa4d..8fb80636b010 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -4527,7 +4527,7 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
>>> return;
>>> #endif
>>>
>>> - if (cfs_rq->nr_running > 1)
>>> + if (cfs_rq->nr_running > cfs_rq.idle_h_nr_running + 1)
>>
>> cfs_rq is a pointer.
> It is. Sorry about that. :)
>
>>
>>> check_preempt_tick(cfs_rq, curr);
>>> }
>>
>> You can't compare cfs_rq->nr_running with cfs_rq->idle_h_nr_running!
>>
>> There is a difference between cfs_rq->h_nr_running and
>> cfs_rq->nr_running. The '_h_' stands for hierarchical.
>>
>> The former gives you hierarchical task accounting whereas the latter is
>> the number of sched entities (representing tasks or taskgroups) enqueued
>> in cfs_rq.
>>
>> In entity_tick(), cfs_rq->nr_running has to be used for the condition to
>> call check_preempt_tick(). We want to check if curr has to be preempted
>> by __pick_first_entity(cfs_rq) on this cfs_rq.
>>
>> entity_tick() is called for each sched entity (and so for each
>> cfs_rq_of(se)) of the task group hierarchy (e.g. task p running in
>> taskgroup /A/B : se(p) -> se(A/B) -> se(A)).
> That’s true. I was thinking adding a new cfs_rq->idle_nr_running member to
> track the per cfs_rq's IDLE task number, and reducing preemption here based
> on that.

How would you deal with se's representing taskgroups which contain
SCHED_IDLE and SCHED_NORMAL tasks or other taskgroups doing that?

> I’m not sure if it’s ok to do that, because the IDLE class seems not to be so
> pure that could tolerate starving.

Not sure I understand but idle_sched_class is not the same as SCHED_IDLE
(policy)?

> We need an absolutely low priority class that could tolerate starving, which
> could be used to co-locate offline tasks. But IDLE class seems to be not
> *low* enough, if considering the fairness of CFS, and IDLE class still has a
> weight.

[...]