[RFC PATCH 5/6] sched,perf: Prepare migration and perf CPU hotplug callbacks for reverseinvocation

From: Srivatsa S. Bhat
Date: Wed Jul 25 2012 - 07:55:05 EST


While dealing with reverse invocation of callbacks during CPU offline, we
get an opportunity to revisit some of the reasons behind the existing callback
invocation orders and how they would fit into the new reverse invocation model
(which poses its own constraints and challenges).

It is documented that the perf and migration CPU hotplug callbacks must be
run pretty early (ie., before the normal callbacks that have priority 0 and
lower)... and also that the perf callbacks must be run before the migration
one. Again, at first glance, it looks like the "Notifier A must be followed
by B, always, in both cpu online and cpu offline paths" rule. However, looking
a bit closely at the code, it appears that this requirement is true mainly for
the CPU online path, and not for the CPU offline path.

In the CPU offline path, some of the perf callbacks deal with low-level
registers, whereas the migration callback deals with the scheduler runqueues
and stuff, which look quite unrelated. Also, there are quite a few priority 0
callbacks that deal with low-level arch-specific-cpu-disable stuff in the
CPU down path.

All in all, it appears that the requirement can be restated as follows:

CPU online path:
Run the perf callbacks early, followed by the migration callback and later
run the priority 0 and other callbacks as usual.

CPU offline path:
Run the migration callback early, followed by the priority 0 callbacks and
later run the perf callbacks.

Keeping this in mind, adjust the perf and migration callbacks in preparation
for moving over to the reverse invocation model. That is, split up the
migration callback into CPU online and CPU offline components and assign
suitable priorities to them. This split would help us move over to the
reverse invocation model easily, since we would now have the necessary control
over both the paths (online and offline) to handle the ordering requirements
correctly.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@xxxxxxxxxxxxxxxxxx>
---

include/linux/cpu.h | 3 ++-
kernel/sched/core.c | 55 ++++++++++++++++++++++++++++++++++++++++-----------
2 files changed, 45 insertions(+), 13 deletions(-)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 255b889..88de47d 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -70,7 +70,8 @@ enum {

/* migration should happen before other stuff but after perf */
CPU_PRI_PERF = 20,
- CPU_PRI_MIGRATION = 10,
+ CPU_PRI_MIGRATION_UP = 10,
+ CPU_PRI_MIGRATION_DOWN = 9,
/* bring up workqueues before normal notifiers and down after */
CPU_PRI_WORKQUEUE_UP = 5,
CPU_PRI_WORKQUEUE_DOWN = -5,
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9ccebdd..6a511f2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5444,12 +5444,10 @@ static void set_rq_offline(struct rq *rq)
}
}

-/*
- * migration_call - callback that gets triggered when a CPU is added.
- * Here we can start up the necessary migration thread for the new CPU.
- */
+/* cpu_up_migration_call - callback that gets triggered when a CPU is added. */
static int __cpuinit
-migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
+cpu_up_migration_call(struct notifier_block *nfb, unsigned long action,
+ void *hcpu)
{
int cpu = (long)hcpu;
unsigned long flags;
@@ -5471,6 +5469,36 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
}
raw_spin_unlock_irqrestore(&rq->lock, flags);
break;
+ }
+
+ update_max_interval();
+
+ return NOTIFY_OK;
+}
+
+/*
+ * Register at high priority so that task migration (migrate_tasks)
+ * happens before everything else. This has to be lower priority than
+ * the notifier in the perf_event subsystem, though.
+ */
+static struct notifier_block __cpuinitdata cpu_up_migration_notifier = {
+ .notifier_call = cpu_up_migration_call,
+ .priority = CPU_PRI_MIGRATION_UP,
+};
+
+/*
+ * cpu_down_migration_call - callback that gets triggered when a CPU is
+ * removed.
+ */
+static int __cpuinit
+cpu_down_migration_call(struct notifier_block *nfb, unsigned long action,
+ void *hcpu)
+{
+ int cpu = (long)hcpu;
+ unsigned long flags;
+ struct rq *rq = cpu_rq(cpu);
+
+ switch (action & ~CPU_TASKS_FROZEN) {

#ifdef CONFIG_HOTPLUG_CPU
case CPU_DYING:
@@ -5497,13 +5525,13 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
}

/*
- * Register at high priority so that task migration (migrate_all_tasks)
+ * Register at high priority so that task migration (migrate_tasks)
* happens before everything else. This has to be lower priority than
* the notifier in the perf_event subsystem, though.
*/
-static struct notifier_block __cpuinitdata migration_notifier = {
- .notifier_call = migration_call,
- .priority = CPU_PRI_MIGRATION,
+static struct notifier_block __cpuinitdata cpu_down_migration_notifier = {
+ .notifier_call = cpu_down_migration_call,
+ .priority = CPU_PRI_MIGRATION_DOWN,
};

/*
@@ -5583,10 +5611,13 @@ static int __init migration_init(void)
int err;

/* Initialize migration for the boot CPU */
- err = migration_call(&migration_notifier, CPU_UP_PREPARE, cpu);
+ err = cpu_up_migration_call(&cpu_up_migration_notifier,
+ CPU_UP_PREPARE, cpu);
BUG_ON(err == NOTIFY_BAD);
- migration_call(&migration_notifier, CPU_ONLINE, cpu);
- register_cpu_notifier(&migration_notifier);
+ cpu_up_migration_call(&cpu_up_migration_notifier, CPU_ONLINE, cpu);
+ register_cpu_notifier(&cpu_up_migration_notifier);
+
+ register_cpu_notifier(&cpu_down_migration_notifier);

/* Register cpu active notifiers */
cpu_notifier(sched_cpu_active, CPU_PRI_SCHED_ACTIVE);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/