[PATCH] sched: Fix numabalancing to work with isolated cpus

From: Srikar Dronamraju
Date: Tue Apr 04 2017 - 13:27:53 EST


When performing load balancing, numabalancing only looks at
task->cpus_allowed to see if the task can run on the target cpu. If
isolcpus kernel parameter is set, then isolated cpus will not be part of
mask task->cpus_allowed.

For example: (On a Power 8 box running in smt 1 mode)

isolcpus=56,64,72,80,88

Cpus_allowed_list: 0-55,57-63,65-71,73-79,81-87,89-175
/proc/20996/task/20996/status:Cpus_allowed_list: 0-55,57-63,65-71,73-79,81-87,89-175
/proc/20996/task/20997/status:Cpus_allowed_list: 0-55,57-63,65-71,73-79,81-87,89-175
/proc/20996/task/20998/status:Cpus_allowed_list: 0-55,57-63,65-71,73-79,81-87,89-175

Note: offline cpus are excluded in cpus_allowed_list.

However a task might call sched_setaffinity() that includes all possible
cpus in the system including the isolated cpus.

For example:
perf bench numa mem --no-data_rand_walk -p 4 -t $THREADS -G 0 -P 3072 -T 0 -l 50 -c -s 1000
would call sched_setaffinity that resets the cpus_allowed mask.

Cpus_allowed_list: 0-55,57-63,65-71,73-79,81-87,89-175
Cpus_allowed_list: 0,8,16,24,32,40,48,56,64,72,80,88,96,104,112,120,128,136,144,152,160,168
Cpus_allowed_list: 0,8,16,24,32,40,48,56,64,72,80,88,96,104,112,120,128,136,144,152,160,168
Cpus_allowed_list: 0,8,16,24,32,40,48,56,64,72,80,88,96,104,112,120,128,136,144,152,160,168
Cpus_allowed_list: 0,8,16,24,32,40,48,56,64,72,80,88,96,104,112,120,128,136,144,152,160,168

The isolated cpus are part of the cpus allowed list. In the above case,
numabalancing ends up scheduling some of these tasks on isolated cpus.

To avoid this, please check for isolated cpus before choosing a target
cpu.

Signed-off-by: Srikar Dronamraju <srikar@xxxxxxxxxxxxxxxxxx>
---
kernel/sched/fair.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f045a35..f853dc0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1666,6 +1666,10 @@ static void task_numa_find_cpu(struct task_numa_env *env,
if (!cpumask_test_cpu(cpu, &env->p->cpus_allowed))
continue;

+ /* Skip isolated cpus */
+ if (cpumask_test_cpu(cpu, cpu_isolated_map))
+ continue;
+
env->dst_cpu = cpu;
task_numa_compare(env, taskimp, groupimp);
}
--
1.8.3.1