Re: [PATCH updated v2] sched/fair: core wide cfs task priority comparison

From: Vineeth Remanan Pillai
Date: Thu May 14 2020 - 18:51:43 EST


Hi Peter,

On Thu, May 14, 2020 at 9:02 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> A little something like so, this syncs min_vruntime when we switch to
> single queue mode. This is very much SMT2 only, I got my head in twist
> when thikning about more siblings, I'll have to try again later.
>
Thanks for the quick patch! :-)

For SMT-n, would it work if sync vruntime if atleast one sibling is
forced idle? Since force_idle is for all the rqs, I think it would
be correct to sync the vruntime if atleast one cpu is forced idle.

> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> - if (is_idle_task(rq_i->core_pick) && rq_i->nr_running)
> - rq_i->core_forceidle = true;
> + if (is_idle_task(rq_i->core_pick)) {
> + if (rq_i->nr_running)
> + rq_i->core_forceidle = true;
> + } else {
> + new_active++;
I think we need to reset new_active on restarting the selection.

> + }
>
> if (i == cpu)
> continue;
> @@ -4476,6 +4473,16 @@ next_class:;
> WARN_ON_ONCE(!cookie_match(next, rq_i->core_pick));
> }
>
> + /* XXX SMT2 only */
> + if (new_active == 1 && old_active > 1) {
As I mentioned above, would it be correct to check if atleast one sibling is
forced_idle? Something like:
if (cpumask_weight(cpu_smt_mask(cpu)) == old_active && new_active < old_active)

> + /*
> + * We just dropped into single-rq mode, increment the sequence
> + * count to trigger the vruntime sync.
> + */
> + rq->core->core_sync_seq++;
> + }
> + rq->core->core_active = new_active;
core_active seems to be unused.

> +bool cfs_prio_less(struct task_struct *a, struct task_struct *b)
> +{
> + struct sched_entity *se_a = &a->se, *se_b = &b->se;
> + struct cfs_rq *cfs_rq_a, *cfa_rq_b;
> + u64 vruntime_a, vruntime_b;
> +
> + while (!is_same_tg(se_a, se_b)) {
> + int se_a_depth = se_a->depth;
> + int se_b_depth = se_b->depth;
> +
> + if (se_a_depth <= se_b_depth)
> + se_b = parent_entity(se_b);
> + if (se_a_depth >= se_b_depth)
> + se_a = parent_entity(se_a);
> + }
> +
> + cfs_rq_a = cfs_rq_of(se_a);
> + cfs_rq_b = cfs_rq_of(se_b);
> +
> + vruntime_a = se_a->vruntime - cfs_rq_a->core_vruntime;
> + vruntime_b = se_b->vruntime - cfs_rq_b->core_vruntime;
Should we be using core_vruntime conditionally? should it be min_vruntime for
default comparisons and core_vruntime during force_idle?

Thanks,
Vineeth