Re: 2.6.18-rc1-mm1 panic on boot x86_64 NMI watchdog detectedLOCKUP

From: Andrew Morton
Date: Tue Jul 11 2006 - 16:19:19 EST


On Tue, 11 Jul 2006 11:13:00 -0700
"Keith Mannthey" <kmannth@xxxxxxxxx> wrote:

> Hello,
> I just tried booting 2.6.18-rc1-mm1 (I was booting 2.6.17-mm6 just
> fine) and got the following error on boot.
>
> CPU 15: synchronized TSC with CPU 0 (last diff 49 cycles, maxerr 4698 cycles)
> Brought up 16 CPUs
> testing NMI watchdog ... OK.
> time.c: Using 333.333333 MHz WALL PIT GTOD PIT/HPET timer.
> time.c: Detected 3002.570 MHz processor.
> migration_cost=9,1121,16845
> checking if image is initramfs... it is
> Freeing initrd memory: 2770k freed
> NMI Watchdog detected LOCKUP on CPU 8
> CPU 8
> Modules linked in:
> Pid: 51, comm: khelper Not tainted 2.6.18-rc1-mm1-smp #2
> RIP: 0010:[<ffffffff803dd6f5>] [<ffffffff803dd6f5>]
> .text.lock.spinlock+0x31/0x8a
> RSP: 0000:ffff81065f91be70 EFLAGS: 00000086
> RAX: 0000000000000000 RBX: ffff810476ce3380 RCX: 0000000000000000
> RDX: ffff81046fad4108 RSI: ffff81046fad4000 RDI: ffff810476ce3384
> RBP: ffff810476ce3380 R08: 0000000000000000 R09: 000000000036f849
> R10: 0000000000000000 R11: 0000000000000002 R12: ffff81065f91bf04
> R13: ffff81065f91bef8 R14: ffff810476dcdd18 R15: ffffffff8023f7a8
> FS: 0000000000000000(0000) GS:ffff810476f79140(0000) knlGS:0000000000000000
> CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0
> Process khelper (pid: 51, threadinfo ffff81065f91a000, task ffff81046fedd080)
> Stack: ffffffff803dd040 ffff81047003f8c0 ffff81065f91bef8 ffff810476dcdd18
> 0000000000000246 ffff81046fad4108 ffff810476ce3380 ffff81046fad4108
> ffffffff8025b211 0000000000000000 0000000000000000 ffff81046fedd080
> Call Trace:
> [<ffffffff803dd040>] __down_read+0x12/0x9a
> [<ffffffff8025b211>] taskstats_exit_alloc+0x59/0x8a
> [<ffffffff80232e89>] do_exit+0x178/0x8f6
> [<ffffffff8023f940>] request_module+0x0/0x150
> [<ffffffff8020a05a>] child_rip+0x8/0x12
> [<ffffffff8023f7a8>] __call_usermodehelper+0x0/0x47
> [<ffffffff8023f866>] ____call_usermodehelper+0x0/0xda
> [<ffffffff8020a052>] child_rip+0x0/0x12
>
>
> Code: 7e f9 e9 d3 fe ff ff f3 90 83 3b 00 7e f9 e9 da fe ff ff e8
> console shuts up ...
>
>
> Any ideas, have we seen this? I can attach config and full dmesg if needed.
>

Thanks. Shailabh sent the below patch through yesterday. It looks awfully
similar.

From: Shailabh Nagar <nagar@xxxxxxxxxxxxxx>

Shift initialization of semaphores taken on exit() path to earlier in the
bootup sequence. Without this fix, booting on large cpu machines hangs at
down_read() called on one of the per-cpu semaphores declared in taskstats.

Signed-off-by: Shailabh Nagar <nagar@xxxxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxx>
---

kernel/taskstats.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff -puN kernel/taskstats.c~per-task-delay-accounting-taskstats-interface-control-exit-data-through-cpumasks-fix-2 kernel/taskstats.c
--- a/kernel/taskstats.c~per-task-delay-accounting-taskstats-interface-control-exit-data-through-cpumasks-fix-2
+++ a/kernel/taskstats.c
@@ -501,15 +501,20 @@ static struct genl_ops taskstats_ops = {
/* Needed early in initialization */
void __init taskstats_init_early(void)
{
+ unsigned int i;
+
taskstats_cache = kmem_cache_create("taskstats_cache",
sizeof(struct taskstats),
0, SLAB_PANIC, NULL, NULL);
+ for_each_possible_cpu(i) {
+ INIT_LIST_HEAD(&(per_cpu(listener_array, i).list));
+ init_rwsem(&(per_cpu(listener_array, i).sem));
+ }
}

static int __init taskstats_init(void)
{
int rc;
- unsigned int i;

rc = genl_register_family(&family);
if (rc)
@@ -519,11 +524,6 @@ static int __init taskstats_init(void)
if (rc < 0)
goto err;

- for_each_possible_cpu(i) {
- INIT_LIST_HEAD(&(per_cpu(listener_array, i).list));
- init_rwsem(&(per_cpu(listener_array, i).sem));
- }
-
family_registered = 1;
return 0;
err:
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/