[PATCH] avoid cpu removal if busy revisited

From: KAMEZAWA Hiroyuki
Date: Thu Jun 22 2006 - 21:48:51 EST


Hi,
thank you for good responses. This is avoid_cpu_removal_if_busy patch.

I added all folks to CC who replied previous "stop_on_cpu_lost" discussion.

How about this ?
(my vocabulary is not rich, then please tell me if you know better name for
sysctl.)

Changes from old version
- added sysctl

-Kame
==

Now, cpu hot remove migrates all tasks on target cpu by force.

During cpu-hot-remove, if tsk->cpus_allowed contains the only target
cpu of removal, tsk->cpus_allowd is disposed and the kernel migrate it to
random cpu.It's obvious that user-land configuration before cpu hot removal
is bad. But this is not good in carefully scheduled environment.

In this case,
1. ignore bad configuration in user-land just do warnings.
2. cancel cpu hot removal and warn users to fix the problem and retry.
seems to be a realisitc workaround. Killing the problematic process may
cause some trouble in user-land (dead-lock etc..)

This patch adds sysctl moderate_cpu_removal.
If moderate_cpu_removal == 0, all tasks are migrated by force.
If moderate_cpu_removal == 1, cpu_hotremoval can fail because of not-migratable
tasks.


Signed-Off-By: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>



include/linux/sysctl.h | 1 +
kernel/sched.c | 42 ++++++++++++++++++++++++++++++++++++++++++
kernel/sysctl.c | 13 +++++++++++++
3 files changed, 56 insertions(+)

Index: linux-2.6.17.org/kernel/sched.c
===================================================================
--- linux-2.6.17.org.orig/kernel/sched.c
+++ linux-2.6.17.org/kernel/sched.c
@@ -4562,6 +4562,44 @@ wait_to_die:
}

#ifdef CONFIG_HOTPLUG_CPU
+/*
+ * if moderate_cpu_removal==1 (sysctl), cpu-hot-remove will fail if cpu is busy.
+ * Default value is 0. all tasks are forced to migrate.
+ */
+int moderate_cpu_removal;
+
+/*
+ * test there are tasks tightly coupled to the target cpu.
+ */
+static int test_cpu_busy(int cpu)
+{
+ cpumask_t mask;
+ int ret = 0;
+ pid_t pid;
+ struct task_struct *p;
+ cpus_clear(mask);
+ cpu_set(cpu, mask);
+
+ read_lock(&tasklist_lock);
+ for_each_process(p) {
+ if (p == current)
+ continue;
+ if (p->mm && cpus_equal(mask, p->cpus_allowed)) {
+ ret = 1;
+ pid = p->pid;
+ break;
+ }
+ }
+ read_unlock(&tasklist_lock);
+ if (ret) {
+ printk(KERN_ERR "cpu(%d) is busy because of task(%d)\n",
+ cpu, pid);
+ printk(KERN_ERR "adjust task(%d) configuration or set "
+ "moderate_cpu_removal to 0 to remove cpu %d\n",
+ pid, cpu);
+ }
+ return ret;
+}
/* Figure out where task on dead CPU should go, use force if neccessary. */
static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *tsk)
{
@@ -4752,6 +4790,10 @@ static int migration_call(struct notifie
kthread_stop(cpu_rq(cpu)->migration_thread);
cpu_rq(cpu)->migration_thread = NULL;
break;
+ case CPU_DOWN_PREPARE:
+ if (moderate_cpu_removal && test_cpu_bust(cpu))
+ return NOTIFY_BAD;
+ break;
case CPU_DEAD:
migrate_live_tasks(cpu);
rq = cpu_rq(cpu);
Index: linux-2.6.17.org/kernel/sysctl.c
===================================================================
--- linux-2.6.17.org.orig/kernel/sysctl.c
+++ linux-2.6.17.org/kernel/sysctl.c
@@ -78,6 +78,9 @@ int unknown_nmi_panic;
extern int proc_unknown_nmi_panic(ctl_table *, int, struct file *,
void __user *, size_t *, loff_t *);
#endif
+#ifdef CONFIG_HOTPLUG_CPU
+int moderate_cpu_removal;
+#endif

/* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
static int maxolduid = 65535;
@@ -683,6 +686,16 @@ static ctl_table kern_table[] = {
.proc_handler = &proc_dointvec,
},
#endif
+#ifdef CONFIG_HOTPLUG_CPU
+ {
+ .ctl_name = KERN_MODERATE_CPU_REMOVAL,
+ .procname = "moderate_cpu_removal",
+ .data = &moderate_cpu_removal,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ }
+#endif
{ .ctl_name = 0 }
};

Index: linux-2.6.17.org/include/linux/sysctl.h
===================================================================
--- linux-2.6.17.org.orig/include/linux/sysctl.h
+++ linux-2.6.17.org/include/linux/sysctl.h
@@ -148,6 +148,7 @@ enum
KERN_SPIN_RETRY=70, /* int: number of spinlock retries */
KERN_ACPI_VIDEO_FLAGS=71, /* int: flags for setting up video after ACPI sleep */
KERN_IA64_UNALIGNED=72, /* int: ia64 unaligned userland trap enable */
+ KERN_MODERATE_CPU_REMOVAL=73, /* int: disallow forced cpu removal */
};



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/