Re: [PATCH] sched/fair: reduce preemption with IDLE tasks runable(Internet mail)
From: Dietmar Eggemann
Date: Thu Aug 06 2020 - 13:22:38 EST
On 03/08/2020 13:26, benbjiang(蒋彪) wrote:
>> On Aug 3, 2020, at 4:16 PM, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
>> On 01/08/2020 04:32, Jiang Biao wrote:
>>> From: Jiang Biao <benbjiang@xxxxxxxxxxx>
>>> No need to preempt when there are only one runable CFS task with
>>> other IDLE tasks on runqueue. The only one CFS task would always
>>> be picked in that case.
>>> Signed-off-by: Jiang Biao <benbjiang@xxxxxxxxxxx>
>>> kernel/sched/fair.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 04fa8dbcfa4d..8fb80636b010 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -4527,7 +4527,7 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
>>> - if (cfs_rq->nr_running > 1)
>>> + if (cfs_rq->nr_running > cfs_rq.idle_h_nr_running + 1)
>> cfs_rq is a pointer.
> It is. Sorry about that. :)
>>> check_preempt_tick(cfs_rq, curr);
>> You can't compare cfs_rq->nr_running with cfs_rq->idle_h_nr_running!
>> There is a difference between cfs_rq->h_nr_running and
>> cfs_rq->nr_running. The '_h_' stands for hierarchical.
>> The former gives you hierarchical task accounting whereas the latter is
>> the number of sched entities (representing tasks or taskgroups) enqueued
>> in cfs_rq.
>> In entity_tick(), cfs_rq->nr_running has to be used for the condition to
>> call check_preempt_tick(). We want to check if curr has to be preempted
>> by __pick_first_entity(cfs_rq) on this cfs_rq.
>> entity_tick() is called for each sched entity (and so for each
>> cfs_rq_of(se)) of the task group hierarchy (e.g. task p running in
>> taskgroup /A/B : se(p) -> se(A/B) -> se(A)).
> That’s true. I was thinking adding a new cfs_rq->idle_nr_running member to
> track the per cfs_rq's IDLE task number, and reducing preemption here based
> on that.
How would you deal with se's representing taskgroups which contain
SCHED_IDLE and SCHED_NORMAL tasks or other taskgroups doing that?
> I’m not sure if it’s ok to do that, because the IDLE class seems not to be so
> pure that could tolerate starving.
Not sure I understand but idle_sched_class is not the same as SCHED_IDLE
> We need an absolutely low priority class that could tolerate starving, which
> could be used to co-locate offline tasks. But IDLE class seems to be not
> *low* enough, if considering the fairness of CFS, and IDLE class still has a