Re: [PATCH] sched/fair: reduce preemption with IDLE tasks runable(Internet mail)

From: Dietmar Eggemann
Date: Mon Aug 10 2020 - 09:24:41 EST


On 06/08/2020 17:52, benbjiang(蒋彪) wrote:
> Hi,
>
>> On Aug 6, 2020, at 9:29 PM, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
>>
>> On 03/08/2020 13:26, benbjiang(蒋彪) wrote:
>>>
>>>
>>>> On Aug 3, 2020, at 4:16 PM, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
>>>>
>>>> On 01/08/2020 04:32, Jiang Biao wrote:
>>>>> From: Jiang Biao <benbjiang@xxxxxxxxxxx>

[...]

>> How would you deal with se's representing taskgroups which contain
>> SCHED_IDLE and SCHED_NORMAL tasks or other taskgroups doing that?
> I’m not sure I get the point. :) How about the following patch,
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 04fa8dbcfa4d..8715f03ed6d7 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2994,6 +2994,9 @@ account_entity_enqueue(struct cfs_rq *cfs_rq, struct sched_entity *se)
> list_add(&se->group_node, &rq->cfs_tasks);
> }
> #endif
> + if (task_has_idle_policy(task_of(se)))
> + cfs_rq->idle_nr_running++;
> +
> cfs_rq->nr_running++;
> }
>
> @@ -3007,6 +3010,9 @@ account_entity_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se)
> list_del_init(&se->group_node);
> }
> #endif
> + if (task_has_idle_policy(task_of(se)))
> + cfs_rq->idle_nr_running--;
> +
> cfs_rq->nr_running--;
> }
>
> @@ -4527,7 +4533,7 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
> return;
> #endif
>
> - if (cfs_rq->nr_running > 1)
> + if (cfs_rq->nr_running > cfs_rq->idle_nr_running + 1 &&
> + cfs_rq->h_nr_running - cfs_rq->idle_h_nr_running > cfs_rq->idle_nr_running + 1)
> check_preempt_tick(cfs_rq, curr);
> }
>
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 877fb08eb1b0..401090393e09 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -500,6 +500,7 @@ struct cfs_bandwidth { };
> struct cfs_rq {
> struct load_weight load;
> unsigned int nr_running;
> + unsigned int idle_nr_running;
> unsigned int h_nr_running; /* SCHED_{NORMAL,BATCH,IDLE} */
> unsigned int idle_h_nr_running; /* SCHED_IDLE */

/
/ | \
A n0 i0
/ \
n1 i1

I don't think this will work. E.g. the patch would prevent tick
preemption between 'A' and 'n0' on '/' as well

(3 > 1 + 1) && (4 - 2 > 1 + 1)

You also have to make sure that a SCHED_IDLE task can tick preempt
another SCHED_IDLE task.

>>> I’m not sure if it’s ok to do that, because the IDLE class seems not to be so
>>> pure that could tolerate starving.
>>
>> Not sure I understand but idle_sched_class is not the same as SCHED_IDLE
>> (policy)?
> The case is that we need tasks(low priority, called offline tasks) to utilize the
> spare cpu left by CFS SCHED_NORMAL tasks(called online tasks) without
> interfering the online tasks.
> Offline tasks only run when there’s no runnable online tasks, and offline tasks
> never preempt online tasks.
> The SCHED_IDLE policy seems not to be abled to be qualified for that requirement,
> because it has a weight(3), even though it’s small, but it can still preempt online
> tasks considering the fairness. In that way, offline tasks of SCHED_IDLE policy
> could interfere the online tasks.

Because of this very small weight (weight=3), compared to a SCHED_NORMAL
nice 0 task (weight=1024), a SCHED_IDLE task is penalized by a huge
se->vruntime value (1024/3 higher than for a SCHED_NORMAL nice 0 task).
This should make sure it doesn't tick preempt a SCHED_NORMAL nice 0 task.

It's different when the SCHED_NORMAL task has nice 19 (weight=15) but
that's part of the CFS design.

> On the other hand, idle_sched_class seems not to be qualified either. It’s too
> simple and only used for per-cpu idle task currently.

Yeah, leave this for the rq->idle task (swapper/X).

>>> We need an absolutely low priority class that could tolerate starving, which
>>> could be used to co-locate offline tasks. But IDLE class seems to be not
>>> *low* enough, if considering the fairness of CFS, and IDLE class still has a
>>> weight.

Understood. But this (tick) preemption should happen extremely rarely,
especially if you have SCHED_NORMAL nice 0 tasks, right?