Re: Oops in find_busiest_group(): 2.6.8-rc1-mm1

From: Nick Piggin
Date: Thu Jul 29 2004 - 03:35:04 EST




Paul Jackson wrote:

I just hit what might be the same oops.

I had not upgraded my working kernel for a month, and just now, when I
upgraded to 2.6.8-rc2-mm1, running sn2_defconfig on a small SN2 system,
it fails to boot everytime, ending with an Oops that starts out with:

======================================================
Freeing unused kernel memory: 320kB freed
Unable to handle kernel NULL pointer dereference (address 0000000000000008)
swapper[0]: Oops 8813272891392 [1]
Modules linked in:

Pid: 0, CPU 0, comm: swapper
psr : 0000101008022018 ifs : 8000000000000e20 ip : [<a0000001000bd710>] Not tainted
ip is at find_busiest_group+0xb0/0x640
======================================================

I added a conditional printk_ratelimit'ed print at the top of
find_busiest_group() whenever group is NULL, just before the first
dereference of group in the line:

local_group = cpu_isset(this_cpu, group->cpumask);

That print fires about 20,480 times each 5 second suppression window.

But it boots, if I also add code to break out of the "do { ... } while
(group != sd->groups)" loop, whenever group goes NULL.



OK, I still can't work out why this is happening. Can you try with
2.6.8-rc2-mm1? Does it happen continually after the system has booted?
If it happens in 2.6.8-rc2-mm1, comment out the call to cpu_attach_domain
in kernel/sched.c (so you'll only be using the dummy boot-up domain).
Does that fix it?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/