Re: [PATCH v5] nohz: set isolcpus when nohz_full is set

From: Chris Metcalf
Date: Fri Apr 10 2015 - 11:33:35 EST


On 04/09/2015 09:05 PM, Mike Galbraith wrote:
On Thu, 2015-04-09 at 19:12 +0200, Peter Zijlstra wrote:
On Thu, Apr 09, 2015 at 12:59:39PM -0400, Chris Metcalf wrote:
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6d77432e14ff..18a961b9beba 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -323,6 +323,7 @@ struct task_struct;
extern int lockdep_tasklist_lock_is_held(void);
#endif /* #ifdef CONFIG_PROVE_RCU */
+extern void sched_isolated_map_add(const struct cpumask *);
extern void sched_init(void);
extern void sched_init_smp(void);
extern asmlinkage void schedule_tail(struct task_struct *prev);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f0f831e8a345..b055c5e0e65c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5824,6 +5824,11 @@ static int __init isolated_cpu_setup(char
*str)
__setup("isolcpus=", isolated_cpu_setup);
+void sched_isolated_map_add(const struct cpumask *cpumask)
+{
+ cpumask_or(cpu_isolated_map, cpu_isolated_map, cpumask);
+}
+
struct s_data {
struct sched_domain ** __percpu sd;
struct root_domain *rd;
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index a4c4edac4528..b0092d02ca3f 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -385,6 +385,9 @@ void __init tick_nohz_init(void)
for_each_cpu(cpu, tick_nohz_full_mask)
context_tracking_cpu_set(cpu);
+ /* It's not meaningful to be nohz without disabling the
scheduler. */
+ sched_isolated_map_add(tick_nohz_full_mask);
+
cpu_notifier(tick_nohz_cpu_down_callback, 0);
pr_info("NO_HZ: Full dynticks CPUs: %*pbl.\n",
cpumask_pr_args(tick_nohz_full_mask));
Right, this could work. Although I would suggest adding a comment
somewhere that we should be careful with init order. I checked, this
appears to be ordered right, but...
I'd embed it in domain construction, that way it'd be ready for the
day nohz_full becomes dynamic, and people can start using cpusets to
set up /tear down isolated sets on the fly.

So, move it to the top of build_sched_domains(), and then for
every "const struct cpumask *cpu_map" argument, create a
temporary cpu_map so we can mask out the nohz_full cores?

The problem is, we already allow partition_sched_domains()
to override "isolcpus=", so it seems appropriate that you should
be able to override "nohz_full=" in the same way, which my
current patch (v6) does.

So I think the proposed solution is certainly no worse than what
we have now in terms of a future migration to cpusets.

--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/