Re: [RFC PATCH V2] sched: Improve scalability of select_idle_sibling using SMT balance

From: Peter Zijlstra
Date: Mon Jan 08 2018 - 17:19:01 EST


On Mon, Jan 08, 2018 at 02:12:37PM -0800, subhra mazumdar wrote:
> @@ -2751,6 +2763,31 @@ context_switch(struct rq *rq, struct task_struct *prev,
> struct task_struct *next, struct rq_flags *rf)
> {
> struct mm_struct *mm, *oldmm;
> + int this_cpu = rq->cpu;
> + struct sched_domain *sd;
> + int prev_busy, next_busy;
> +
> + if (rq->curr_util == UTIL_UNINITIALIZED)
> + prev_busy = 0;
> + else
> + prev_busy = (prev != rq->idle);
> + next_busy = (next != rq->idle);
> +
> + /*
> + * From sd_llc downward update the SMT utilization.
> + * Skip the lowest level 0.
> + */
> + sd = rcu_dereference_sched(per_cpu(sd_llc, this_cpu));
> + if (next_busy != prev_busy) {
> + for_each_lower_domain(sd) {
> + if (sd->level == 0)
> + break;
> + sd_context_switch(sd, rq, next_busy - prev_busy);
> + }
> + }
> +

No, we're not going to be adding atomic ops here. We've been arguing
over adding a single memory barrier to this path, atomic are just not
going to happen.

Also this is entirely the wrong way to do this, we already have code
paths that _know_ if they're going into or coming out of idle.