Re: [RFC PATCH 07/16] sched/fair: Fix forced idle sibling starvation corner case(Internet mail)

From: benbjiang(蒋彪)
Date: Wed Jul 22 2020 - 03:20:20 EST




> On Jul 1, 2020, at 5:32 AM, Vineeth Remanan Pillai <vpillai@xxxxxxxxxxxxxxxx> wrote:
>
> From: vpillai <vpillai@xxxxxxxxxxxxxxxx>
>
> If there is only one long running local task and the sibling is
> forced idle, it might not get a chance to run until a schedule
> event happens on any cpu in the core.
>
> So we check for this condition during a tick to see if a sibling
> is starved and then give it a chance to schedule.
Hi,

There may be other similar starvation cases this patch can not cover.
Such as, If there is one long running RT task and sibling is forced idle, then all tasks with different cookies on all siblings could be starving forever.
Current load-balances seems not able to pull the starved tasks away.
Would load-balance be more aware of core-scheduling to make things better? :)

Thx.
Regards,
Jiang

>
> Signed-off-by: Vineeth Remanan Pillai <vpillai@xxxxxxxxxxxxxxxx>
> Signed-off-by: Julien Desfossez <jdesfossez@xxxxxxxxxxxxxxxx>
> ---
> kernel/sched/fair.c | 39 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 39 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ae17507533a0..49fb93296e35 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -10613,6 +10613,40 @@ static void rq_offline_fair(struct rq *rq)
>
> #endif /* CONFIG_SMP */
>
> +#ifdef CONFIG_SCHED_CORE
> +static inline bool
> +__entity_slice_used(struct sched_entity *se)
> +{
> + return (se->sum_exec_runtime - se->prev_sum_exec_runtime) >
> + sched_slice(cfs_rq_of(se), se);
> +}
> +
> +/*
> + * If runqueue has only one task which used up its slice and if the sibling
> + * is forced idle, then trigger schedule to give forced idle task a chance.
> + */
> +static void resched_forceidle_sibling(struct rq *rq, struct sched_entity *se)
> +{
> + int cpu = cpu_of(rq), sibling_cpu;
> +
> + if (rq->cfs.nr_running > 1 || !__entity_slice_used(se))
> + return;
> +
> + for_each_cpu(sibling_cpu, cpu_smt_mask(cpu)) {
> + struct rq *sibling_rq;
> + if (sibling_cpu == cpu)
> + continue;
> + if (cpu_is_offline(sibling_cpu))
> + continue;
> +
> + sibling_rq = cpu_rq(sibling_cpu);
> + if (sibling_rq->core_forceidle) {
> + resched_curr(sibling_rq);
> + }
> + }
> +}
> +#endif
> +
> /*
> * scheduler tick hitting a task of our scheduling class.
> *
> @@ -10636,6 +10670,11 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
>
> update_misfit_status(curr, rq);
> update_overutilized_status(task_rq(curr));
> +
> +#ifdef CONFIG_SCHED_CORE
> + if (sched_core_enabled(rq))
> + resched_forceidle_sibling(rq, &curr->se);
> +#endif
> }
>
> /*
> --
> 2.17.1
>
>