Fix for OpenSUSE kernel bug (was Re: [Opps] Invalid opcode)
From: Zachary Amsden
Date: Thu Nov 30 2006 - 01:45:17 EST
S.ÃaÄlar Onur wrote:
05 Kas 2006 Paz 18:40 tarihinde, Andi Kleen ÅunlarÄ yazmÄÅtÄ:
How do you know this?
Just guessing, if im not wrong panics occur after SMP alternative switching
code done its job.
And does it still happen in 2.6.19-rc4?
Will try
in VmWare and Microsoft Virtual
PC and in order to confirm this bug is not our distro specific i
downloaded and tried latest OpenSuse also [1] and [2] are screens
captured by vmware but exact same panic occurs in Virtual PC as reported
to us in [3].
Always the same BUG()?
Yes, same bug
There is just some rolling Turkish text there.
Ah im sorry here is the correct links :(
[1] http://cekirdek.pardus.org.tr/~caglar/2.6.18/panic_on_opensuse.png
[2] http://cekirdek.pardus.org.tr/~caglar/2.6.18/panic_on_pardus.png
Cheers
I'm proposing this as a fix for your bug. Having tasklets scheduled
before softirqd gets to run might be somewhat backwards, but there is
nothing I can find wrong about it from a correctness point of view.
Better to boot the kernel even when compiled with bug checking on, I think.
This bug started becoming apparent in 2.6.18 because of some rework with
the CPU hotplug code, but in theory, it exists at least all the way back
to 2.6.10, which is as far as I looked backwards in time.
Zach
It is possible to have tasklets get scheduled before softirqd has had
a chance to spawn on all CPUs. This is totally harmless; after success
during action CPU_UP_PREPARE, action CPU_ONLINE will be called, which
immediately wakes softirqd on the appropriate CPU to process the already
pending tasklets. So there is no danger of having a missed wakeup for
any tasklets that were already pending.
In particular, i386 is affected by this during startup, and is visible when
using a very large initrd; during the time it takes for the initrd to be
decompressed, a timer IRQ can come in and schedule RCU callbacks. It is also
possible that resending of a hardware IRQ via a softirq triggers the same bug.
Because of different timing conditions, this shows up in all emulators
and virtual machines tested, including Xen, VMware, Virtual PC, and Qemu.
It is also possible to trigger on native hardware with a large enough initrd,
although I don't have a reliable case demonstrating that.
Signed-off-by: Zachary Amsden <zach@xxxxxxxxxx>
Index: linux-2.6.18/kernel/softirq.c
===================================================================
--- linux-2.6.18.orig/kernel/softirq.c 2006-11-10 14:44:39.000000000 -0800
+++ linux-2.6.18/kernel/softirq.c 2006-11-29 22:19:36.000000000 -0800
@@ -574,8 +574,6 @@ static int __cpuinit cpu_callback(struct
switch (action) {
case CPU_UP_PREPARE:
- BUG_ON(per_cpu(tasklet_vec, hotcpu).list);
- BUG_ON(per_cpu(tasklet_hi_vec, hotcpu).list);
p = kthread_create(ksoftirqd, hcpu, "ksoftirqd/%d", hotcpu);
if (IS_ERR(p)) {
printk("ksoftirqd for %i failed\n", hotcpu);