Re: [PATCH] sched/fair: reduce preemption with IDLE tasks runable(Internet mail)

From: benbjiang(蒋彪)
Date: Mon Aug 10 2020 - 20:41:48 EST


Hi,

> On Aug 10, 2020, at 9:24 PM, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
>
> On 06/08/2020 17:52, benbjiang(蒋彪) wrote:
>> Hi,
>>
>>> On Aug 6, 2020, at 9:29 PM, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
>>>
>>> On 03/08/2020 13:26, benbjiang(蒋彪) wrote:
>>>>
>>>>
>>>>> On Aug 3, 2020, at 4:16 PM, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
>>>>>
>>>>> On 01/08/2020 04:32, Jiang Biao wrote:
>>>>>> From: Jiang Biao <benbjiang@xxxxxxxxxxx>
>
> [...]
>
>>> How would you deal with se's representing taskgroups which contain
>>> SCHED_IDLE and SCHED_NORMAL tasks or other taskgroups doing that?
>> I’m not sure I get the point. :) How about the following patch,
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 04fa8dbcfa4d..8715f03ed6d7 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -2994,6 +2994,9 @@ account_entity_enqueue(struct cfs_rq *cfs_rq, struct sched_entity *se)
>> list_add(&se->group_node, &rq->cfs_tasks);
>> }
>> #endif
>> + if (task_has_idle_policy(task_of(se)))
>> + cfs_rq->idle_nr_running++;
>> +
>> cfs_rq->nr_running++;
>> }
>>
>> @@ -3007,6 +3010,9 @@ account_entity_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se)
>> list_del_init(&se->group_node);
>> }
>> #endif
>> + if (task_has_idle_policy(task_of(se)))
>> + cfs_rq->idle_nr_running--;
>> +
>> cfs_rq->nr_running--;
>> }
>>
>> @@ -4527,7 +4533,7 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
>> return;
>> #endif
>>
>> - if (cfs_rq->nr_running > 1)
>> + if (cfs_rq->nr_running > cfs_rq->idle_nr_running + 1 &&
>> + cfs_rq->h_nr_running - cfs_rq->idle_h_nr_running > cfs_rq->idle_nr_running + 1)
>> check_preempt_tick(cfs_rq, curr);
>> }
>>
>> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
>> index 877fb08eb1b0..401090393e09 100644
>> --- a/kernel/sched/sched.h
>> +++ b/kernel/sched/sched.h
>> @@ -500,6 +500,7 @@ struct cfs_bandwidth { };
>> struct cfs_rq {
>> struct load_weight load;
>> unsigned int nr_running;
>> + unsigned int idle_nr_running;
>> unsigned int h_nr_running; /* SCHED_{NORMAL,BATCH,IDLE} */
>> unsigned int idle_h_nr_running; /* SCHED_IDLE */
>
> /
> / | \
> A n0 i0
> / \
> n1 i1
>
> I don't think this will work. E.g. the patch would prevent tick
> preemption between 'A' and 'n0' on '/' as well
>
> (3 > 1 + 1) && (4 - 2 > 1 + 1)
>
> You also have to make sure that a SCHED_IDLE task can tick preempt
> another SCHED_IDLE task.

That’s right. :)

>
>>>> I’m not sure if it’s ok to do that, because the IDLE class seems not to be so
>>>> pure that could tolerate starving.
>>>
>>> Not sure I understand but idle_sched_class is not the same as SCHED_IDLE
>>> (policy)?
>> The case is that we need tasks(low priority, called offline tasks) to utilize the
>> spare cpu left by CFS SCHED_NORMAL tasks(called online tasks) without
>> interfering the online tasks.
>> Offline tasks only run when there’s no runnable online tasks, and offline tasks
>> never preempt online tasks.
>> The SCHED_IDLE policy seems not to be abled to be qualified for that requirement,
>> because it has a weight(3), even though it’s small, but it can still preempt online
>> tasks considering the fairness. In that way, offline tasks of SCHED_IDLE policy
>> could interfere the online tasks.
>
> Because of this very small weight (weight=3), compared to a SCHED_NORMAL
> nice 0 task (weight=1024), a SCHED_IDLE task is penalized by a huge
> se->vruntime value (1024/3 higher than for a SCHED_NORMAL nice 0 task).
> This should make sure it doesn't tick preempt a SCHED_NORMAL nice 0 task.
Could you please explain how the huge penalization of vruntime(1024/3) could
make sure SCHED_IDLE not tick preempting SCHED_NORMAL nice 0 task?

Thanks a lot.

Regards,
Jiang

>
> It's different when the SCHED_NORMAL task has nice 19 (weight=15) but
> that's part of the CFS design.
>
>> On the other hand, idle_sched_class seems not to be qualified either. It’s too
>> simple and only used for per-cpu idle task currently.
>
> Yeah, leave this for the rq->idle task (swapper/X).
Got it.

>
>>>> We need an absolutely low priority class that could tolerate starving, which
>>>> could be used to co-locate offline tasks. But IDLE class seems to be not
>>>> *low* enough, if considering the fairness of CFS, and IDLE class still has a
>>>> weight.
>
> Understood. But this (tick) preemption should happen extremely rarely,
> especially if you have SCHED_NORMAL nice 0 tasks, right?