Re: [PATCH] cpusets+hotplug+preepmt broken

From: Dinakar Guniguntala
Date: Thu May 12 2005 - 10:01:39 EST


On Wed, May 11, 2005 at 01:42:35PM -0700, Paul Jackson wrote:
> So what we'd really like to do would be to first fallback to all the
> cpus allowed in the specified tasks cpuset (no walking the cpuset
> hierarchy), and see if any of those cpus are still online to receive
> this orphan task. Unless someone has botched the system configuration,
> and taken offline all the cpus in a cpuset, this should yield up a cpu
> that is still both allowed and online. If that fails, then to heck with
> honoring cpuset placement - just take the first online cpu we can find.
>

I tried your suggestion, but hit another oops. This has more to do with
hotplug+preempt looks like

Code: fc 89 ec 5d e9 8b 0e 00 00 c7 04 24 5c 49 41 c0 e8 7f 39 00 00 e8 da 87 fe ff eb c1 0f 0b 36 11 32 96 41 c0 eb 92 83 f8
<6>note: cpuoff.sh[25439] exited with preempt_count 1
scheduling while atomic: cpuoff.sh/0x10000001/25439
[<c04010d2>] schedule+0x4b2/0x4c0
[<c0152da0>] unmap_page_range+0xc0/0xe0
[<c0401988>] cond_resched+0x28/0x40
[<c0152f56>] unmap_vmas+0x196/0x290
[<c0157c93>] exit_mmap+0xa3/0x1a0
[<c011ce53>] mmput+0x43/0x100
[<c01222f0>] do_exit+0xf0/0x3b0
[<c01047da>] die+0x18a/0x190
[<c0104bb0>] do_invalid_op+0x0/0xd0
[<c0104c5e>] do_invalid_op+0xae/0xd0
[<c011bbe7>] migrate_dead+0xa7/0xc0
[<c0400f14>] schedule+0x2f4/0x4c0
[<c040112b>] preempt_schedule+0x4b/0x70
[<c040112b>] preempt_schedule+0x4b/0x70
[<c0103fd7>] error_code+0x4f/0x54
[<c040007b>] svc_proc_unregister+0x1b/0x20
[<c011007b>] __acpi_map_table+0x7b/0xc0
[<c011bbe7>] migrate_dead+0xa7/0xc0
[<c011fef9>] profile_handoff_task+0x39/0x50
[<c011bc62>] migrate_dead_tasks+0x62/0x90
[<c011bda6>] migration_call+0x116/0x2c0
[<c0142102>] __stop_machine_run+0x92/0xb0
[<c012ddad>] notifier_call_chain+0x2d/0x50
[<c013a3cb>] cpu_down+0x16b/0x2a0
[<c027db0b>] store_online+0x5b/0x80
[<c027ab25>] sysdev_store+0x35/0x40
[<c019f14e>] flush_write_buffer+0x3e/0x50
[<c019f1b8>] sysfs_write_file+0x58/0x80
[<c0163a36>] vfs_write+0xc6/0x180
[<c0163bc1>] sys_write+0x51/0x80
[<c01034e5>] syscall_call+0x7/0xb


On 2.6.12-rc4-mm1 it is even worse, it panics

Booting processor 2/1 eip 3000
Calibrating delay using timer specific routine.. 1800.08 BogoMIPS (lpj=900043)
CPU2: Intel Pentium III (Cascades) stepping 04
Unable to handle kernel NULL pointer dereference<7>CPU0 attaching sched-domain:
at virtual address 00000001
printing eip:
00000001
*pde = 000001e3
*pte = 00000001
Oops: 0000 [#1]
PREEMPT SMP
Modules linked in:
CPU: 2
EIP: 0060:[<00000001>] Not tainted VLI
EFLAGS: 00010006 (2.6.12-rc4-mm1)
EIP is at 0x1
eax: c18c0000 ebx: c04758a0 ecx: 00000001 edx: e8c13eb4
esi: ffffffff edi: c185c030 ebp: c18c1f80 esp: c18c1f2c
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, threadinfo=c18c0000 task=c185c030)
Stack: c0114e0d e8c13eb4 c18c0000 c0103ecc c18c0000 00000008 00000002 ffffffff
c185c030 c18c1f80 00000001 0000007b 0000007b fffffffb c04048a0 00000060
00000206 c040511c 00000003 00000000 00000000 c18c0000 c01033ee 00000003
Call Trace:
[<c0114e0d>] smp_call_function_interrupt+0x3d/0x60
[<c0103ecc>] call_function_interrupt+0x1c/0x24
[<c04048a0>] schedule+0x0/0x7c0
[<c040511c>] preempt_schedule_irq+0x4c/0x80
[<c01033ee>] need_resched+0x1f/0x21
[<c011516f>] start_secondary+0x10f/0x1a0
Code: Bad EIP value.
<0>Kernel panic - not syncing: Fatal exception in interrupt
Booting processor 3/2 eip 3000
Initializing CPU#3

I really havent had a chance to investigate whats going on, should be able to
do that tomorrow. Here is the patch I tried, my .config and scripts

-Dinakar


diff -Naurp linux-2.6.12-rc2-mm3.orig/include/linux/cpuset.h linux-2.6.12-rc2-mm3-0/include/linux/cpuset.h
--- linux-2.6.12-rc2-mm3.orig/include/linux/cpuset.h 2005-05-11 19:44:42.000000000 +0530
+++ linux-2.6.12-rc2-mm3-0/include/linux/cpuset.h 2005-05-12 16:54:15.000000000 +0530
@@ -19,6 +19,7 @@ extern void cpuset_init_smp(void);
extern void cpuset_fork(struct task_struct *p);
extern void cpuset_exit(struct task_struct *p);
extern cpumask_t cpuset_cpus_allowed(const struct task_struct *p);
+extern const cpumask_t cpuset_task_cpus_allowed(const struct task_struct *p);
void cpuset_init_current_mems_allowed(void);
void cpuset_update_current_mems_allowed(void);
void cpuset_restrict_to_mems_allowed(unsigned long *nodes);
@@ -38,6 +39,10 @@ static inline cpumask_t cpuset_cpus_allo
{
return cpu_possible_map;
}
+static inline cpumask_t cpuset_task_cpus_allowed(struct task_struct *p)
+{
+ return cpu_possible_map;
+}

static inline void cpuset_init_current_mems_allowed(void) {}
static inline void cpuset_update_current_mems_allowed(void) {}
diff -Naurp linux-2.6.12-rc2-mm3.orig/kernel/cpuset.c linux-2.6.12-rc2-mm3-0/kernel/cpuset.c
--- linux-2.6.12-rc2-mm3.orig/kernel/cpuset.c 2005-05-11 19:44:42.000000000 +0530
+++ linux-2.6.12-rc2-mm3-0/kernel/cpuset.c 2005-05-12 17:19:53.000000000 +0530
@@ -1445,6 +1445,31 @@ cpumask_t cpuset_cpus_allowed(const stru
return mask;
}

+/**
+ * cpuset_task_cpus_allowed - return cpus_allowed mask from a tasks cpuset.
+ * @tsk: pointer to task_struct from which to obtain cpuset->cpus_allowed.
+ *
+ * Description: Returns the cpumask_t cpus_allowed of the cpuset
+ * attached to the specified @tsk. Unlike cpuset_cpus_allowed(),
+ * is not guaranteed to return a non-empty subset of cpu_online_map.
+ * Does not walk up the cpuset hierarchy, and does not attempt to
+ * acquire the cpuset_sem. If called on a task about to exit,
+ * where tsk->cpuset is already NULL, return cpu_online_map.
+ *
+ * Call with task locked.
+ **/
+
+const cpumask_t cpuset_task_cpus_allowed(const struct task_struct *tsk)
+{
+ struct cpuset *cs = tsk->cpuset;
+
+ if (!cs)
+ return cpu_online_map;
+ if (cpus_intersects(cs->cpus_allowed, cpu_online_map))
+ return cs->cpus_allowed;
+ return cpu_online_map;
+}
+
void cpuset_init_current_mems_allowed(void)
{
current->mems_allowed = NODE_MASK_ALL;
diff -Naurp linux-2.6.12-rc2-mm3.orig/kernel/sched.c linux-2.6.12-rc2-mm3-0/kernel/sched.c
--- linux-2.6.12-rc2-mm3.orig/kernel/sched.c 2005-05-11 19:44:42.000000000 +0530
+++ linux-2.6.12-rc2-mm3-0/kernel/sched.c 2005-05-12 16:50:04.000000000 +0530
@@ -4301,7 +4301,7 @@ static void move_task_off_dead_cpu(int d

/* No more Mr. Nice Guy. */
if (dest_cpu == NR_CPUS) {
- tsk->cpus_allowed = cpuset_cpus_allowed(tsk);
+ tsk->cpus_allowed = cpuset_task_cpus_allowed(tsk);
dest_cpu = any_online_cpu(tsk->cpus_allowed);

/*

Attachment: test-scripts.tar.gz
Description: application/tar-gz