Re: Seeing "huh, entered softirq 8 ffffffff802682aa preempt_count00000100, exited with 00010100?" in tip.git
From: Jeremy Fitzhardinge
Date: Fri Jan 30 2009 - 19:48:29 EST
Ingo Molnar wrote:
* Ingo Molnar <mingo@xxxxxxx> wrote:
* Ingo Molnar <mingo@xxxxxxx> wrote:
Call Trace:
[<ffffffff80238c1f>] __schedule_bug+0x62/0x66
[<ffffffff80211d2d>] ? retint_restore_args+0x5/0x20
[<ffffffff80503921>] __schedule+0x95/0x792
[<ffffffff802093aa>] ? _stext+0x3aa/0x1000
[<ffffffff802093aa>] ? _stext+0x3aa/0x1000
[<ffffffff805040c2>] schedule+0xe/0x22
[<ffffffff8020ff04>] cpu_idle+0x70/0x72
[<ffffffff804fc3a0>] cpu_bringup_and_idle+0x13/0x15
Creating initial device nodes
Setting up hotplug.
From what I can see, softirq 8 is the RCU softirq. I don't know if
the "scheduling while atomic" is related or not, but its two new
schedulerish symptoms appearing at once, so I think its likely
they're related.
Hmmm... Mysterious, as you seem to be using classic RCU, which hasn't
changed in awhile. Which branch of the tip tree are you using?
tip/master. It looks like this appeared since -rc1. Mu current
suspicion is the percpu changes, since I'm seeing some other strange
symptoms.
Cc:-ed more folks - it's either the percpu changes or the APIC changes
(both occured at about the same time). Or maybe something from upstream.
managed to bisect one of the boot crashes i've been seeing:
a698c823e15149941b0f0281527d0c0d1daf2639 is first bad commit
commit a698c823e15149941b0f0281527d0c0d1daf2639
Author: Tejun Heo <tj@xxxxxxxxxx>
Date: Tue Jan 13 20:41:35 2009 +0900
x86: make vmlinux_32.lds.S use PERCPU() macro
Make vmlinux_32.lds.S use the generic PERCPU() macro instead of open
coding it. This will ease future changes.
Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>
Signed-off-by: Ingo Molnar <mingo@xxxxxxx>
# bad: [b16884e8] Merge branch 'x86/urgent'
# good: [f2257b70] Merge git://git.kernel.org/pub/scm/linux/kernel/gi
# good: [345fa66b] Merge branch 'core/locking'
# bad: [f534caca] Merge branch 'oprofile'
# bad: [3eb3963f] Merge branch 'cpus4096' into core/percpu
# bad: [1b437c8c] x86-64: Move irq stats from PDA to per-cpu and con
# good: [54da5b3d] x86: fix broken flush_tlb_others_ipi(), fix
# good: [c2c21745] x86: replacing mp_config_intsrc with mpc_intsrc
# bad: [c8f3329a] x86: use static _cpu_pda array
# good: [7de6883f] x86: fix pda_to_op()
# bad: [a698c823] x86: make vmlinux_32.lds.S use PERCPU() macro
# good: [c90aa894] x86: cleanup early setup_percpu references
testing the revert now.
This might be similar to the other 32-bit linker bug that was tracked
down yesterday and reverted - maybe that revert unearthed a problem with
this commit?
seems to do the trick.
Tejun, a detail, this config has:
CONFIG_RELOCATABLE=y
Have you considered 32-bit relocatable kernels too? Config attached.
I found my bug. Turns out all my CPUs were sharing the same kernel
stack (!), which means it was working surprisingly well, considering...
J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/