[PATCH v2] sched/core: Don't mix isolcpus and housekeeping CPUs
From: Srikar Dronamraju
Date: Tue Oct 23 2018 - 23:03:04 EST
Load balancer and NUMA balancer are not suppose to work on isolcpus.
Currently when setting sched affinity, there are no checks to see if the
requested cpumask has CPUs from both isolcpus and housekeeping CPUs.
If user passes a mix of isolcpus and housekeeping CPUs, then
NUMA balancer can pick a isolcpu to schedule.
With this change, if a combination of isolcpus and housekeeping CPUs are
provided, then we restrict ourselves to housekeeping CPUs.
For example: System with 32 CPUs
$ grep -o "isolcpus=[,,1-9]*" /proc/cmdline
isolcpus=1,5,9,13
$ grep -i cpus_allowed /proc/$$/status
Cpus_allowed: ffffdddd
Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
Running "perf bench numa mem --no-data_rand_walk -p 4 -t 8 -G 0 -P 3072
-T 0 -l 50 -c -s 1000" which calls sched_setaffinity to all CPUs in
system.
Without patch
------------
$ for i in $(pgrep -f perf); do grep -i cpus_allowed_list /proc/$i/task/*/status ; done | head -n 10
Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/2107/task/2107/status:Cpus_allowed_list: 0-31
/proc/2107/task/2196/status:Cpus_allowed_list: 0-31
/proc/2107/task/2197/status:Cpus_allowed_list: 0-31
/proc/2107/task/2198/status:Cpus_allowed_list: 0-31
/proc/2107/task/2199/status:Cpus_allowed_list: 0-31
/proc/2107/task/2200/status:Cpus_allowed_list: 0-31
/proc/2107/task/2201/status:Cpus_allowed_list: 0-31
/proc/2107/task/2202/status:Cpus_allowed_list: 0-31
/proc/2107/task/2203/status:Cpus_allowed_list: 0-31
With patch
----------
$ for i in $(pgrep -f perf); do grep -i cpus_allowed_list /proc/$i/task/*/status ; done | head -n 10
Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18591/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18603/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18604/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18605/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18606/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18607/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18608/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18609/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
/proc/18591/task/18610/status:Cpus_allowed_list: 0,2-4,6-8,10-12,14-31
Signed-off-by: Srikar Dronamraju <srikar@xxxxxxxxxxxxxxxxxx>
---
Changelog v1->v2:
constification of hk_mask (reported by kbuild test robot)
kernel/sched/core.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ad97f3b..54e7207 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4734,6 +4734,7 @@ static int sched_read_attr(struct sched_attr __user *uattr,
long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
{
cpumask_var_t cpus_allowed, new_mask;
+ const struct cpumask *hk_mask;
struct task_struct *p;
int retval;
@@ -4778,6 +4779,19 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
cpuset_cpus_allowed(p, cpus_allowed);
cpumask_and(new_mask, in_mask, cpus_allowed);
+ hk_mask = housekeeping_cpumask(HK_FLAG_DOMAIN);
+
+ /*
+ * If the cpumask provided has CPUs that are part of isolated and
+ * housekeeping_cpumask, then restrict it to just the CPUs that
+ * are part of the housekeeping_cpumask.
+ */
+ if (!cpumask_subset(new_mask, hk_mask) &&
+ cpumask_intersects(new_mask, hk_mask)) {
+ pr_info("pid %d: Mix of isolcpus and non-isolcpus provided\n",
+ p->pid);
+ cpumask_and(new_mask, new_mask, hk_mask);
+ }
/*
* Since bandwidth control happens on root_domain basis,
--
1.8.3.1