Re: [RFC 1/2] sched/fair: Fix load_balance() affinity redo path

From: Peter Zijlstra
Date: Fri May 12 2017 - 16:48:17 EST


On Fri, May 12, 2017 at 11:01:37AM -0600, Jeffrey Hugo wrote:
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index d711093..8f783ba 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8219,8 +8219,19 @@ static int load_balance(int this_cpu, struct rq *this_rq,
>
> /* All tasks on this runqueue were pinned by CPU affinity */
> if (unlikely(env.flags & LBF_ALL_PINNED)) {
> + struct cpumask tmp;

You cannot have cpumask's on stack.

> +
> + /* Cpumask of all initially possible busiest cpus. */
> + cpumask_copy(&tmp, sched_domain_span(env.sd));
> + cpumask_clear_cpu(env.dst_cpu, &tmp);

You forgot to mask with cpu_active_mask.

> +
> cpumask_clear_cpu(cpu_of(busiest), cpus);
> - if (!cpumask_empty(cpus)) {
> + /*
> + * Go back to "redo" iff the load-balance cpumask
> + * contains other potential busiest cpus for the
> + * current sched domain.
> + */
> + if (cpumask_intersects(cpus, &tmp)) {
> env.loop = 0;
> env.loop_break = sched_nr_migrate_break;
> goto redo;