Re: 2.6.21-git8+ BUG: NMI Watchdog detected LOCKUP on CPU1

From: Michal Piotrowski
Date: Tue May 08 2007 - 07:06:29 EST


On 08/05/07, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
On Tue, 08 May 2007 10:35:14 +0200 Michal Piotrowski <michal.k.k.piotrowski@xxxxxxxxx> wrote:

> Hi,
>
> / filesystem was full
>
> [39525.460000] BUG: NMI Watchdog detected LOCKUP on CPU1, eip 08056990, registers:
> [39525.468000] Modules linked in: loop ipt_MASQUERADE iptable_nat nf_nat autofs4 af_packet nf_conntrack_netbios_ns ipt_REJECT nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 binfmt_misc thermal processor fan container nvram snd_intel8x0 snd_ac97_codec ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss evdev snd_pcm intel_agp snd_timer snd agpgart soundcore i2c_i801 snd_page_alloc ide_cd cdrom rtc unix
> [39525.518000] CPU: 1
> [39525.518000] EIP: 0073:[<08056990>] Not tainted VLI
> [39525.518000] EFLAGS: 00000202 (2.6.21-ga989705c #187)
> [39525.529000] EIP is at 0x8056990
> [39525.529000] eax: 6e560d60 ebx: 0000000b ecx: 00000000 edx: 000dd15e
> [39525.541000] esi: 00000000 edi: 6e560220 ebp: bfeb0a58 esp: bfeb0990
> [39525.547000] ds: 007b es: 007b fs: 0000 gs: 0033 ss: 007b
> [39525.553000] Process line (pid: 4277, ti=cf200000 task=f6f560b0 task.ti=cf200000)
> [39525.560000] Kernel panic - not syncing: Aiee, killing interrupt handler!
>
> http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-git8/git-console.log
> http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-git8/git-config
>

I don't know what caused the CPU to jump into hyperspace like that, but Patrick
tells me that this:

> [38773.921000] printk: 15909 messages suppressed.
> [38773.926000] ipt_hook: happy cracking.
> [38778.921000] printk: 16332 messages suppressed.
> [38778.925000] ipt_hook: happy cracking.
> [38783.921000] printk: 16175 messages suppressed.
> [38783.926000] ipt_hook: happy cracking.
> [38788.921000] printk: 16390 messages suppressed.
> [38788.925000] ipt_hook: happy cracking.
> [38793.921000] printk: 16289 messages suppressed.
> [38793.925000] ipt_hook: happy cracking.
> [38798.921000] printk: 16172 messages suppressed.
> [38798.926000] ipt_hook: happy cracking.
> [38803.921000] printk: 15738 messages suppressed.
> [38803.925000] ipt_hook: happy cracking.
> [38808.921000] printk: 14731 messages suppressed.

happens when a local process sends packets with invalid IP headers
through raw sockets.

Yes, it was an isic session.


[ 5225.195000] UDP: short packet: From 37.126.206.54:46544 39671/1182
to 127.0.0.1:40761

This seems to indicate something on the local machine (packets are not
routed to 127.0.0.1) is sending invalid packets, probably with
incorrectly set up skb pointers.

I'd suggest to add a WARN_ON(1) in ipt_local_hook().

So can you please add the appropriate WARN_ON?

Whatever happens, that printk should be toned down, shouldn't it? We
prefer to not let unprivileged apps spam the logs.



[39293.925000] ipt_hook: happy cracking.
[39429.024000] printk: 15828 messages suppressed.
[39429.028000] nf_conntrack: table full, dropping packet.
[39430.034000] nf_conntrack: table full, dropping packet.
[39431.039000] nf_conntrack: table full, dropping packet.
[39432.044000] nf_conntrack: table full, dropping packet.
[39444.056000] nf_conntrack: table full, dropping packet.
[39445.061000] nf_conntrack: table full, dropping packet.
[39525.460000] BUG: NMI Watchdog detected LOCKUP on CPU1, eip
08056990, registers:

This lockup occurred after an isic test. Hmmm... linus_stress?

FAIL aio_dio_bugs Command
<LD_LIBRARY_PATH=/usr/local/autotest/client/deps/libaio/lib/
/usr/local/autotest/client/tests/aio_dio_bugs/src/aio-dio-extend-stat
file> failed, rc=32512
GOOD aiostress completed successfully
GOOD bonnie completed successfully
GOOD cpu_hotplug completed successfully
GOOD cyclictest completed successfully
GOOD dbench completed successfully
FAIL disktest running test disktest
<--[random error]-->
FAIL fs_mark Command <./fs_mark -d /mnt -s 10240 -n 1000> failed, rc=256
GOOD fsfuzzer completed successfully
GOOD fsx completed successfully
FAIL interbench Command
</usr/local/autotest/client/tests/interbench/src/interbench -m 'run
#0' -c> failed, rc=256
GOOD iozone completed successfully
FAIL isic running test job
Traceback (most recent call last):
File "/usr/local/autotest/client/bin/job.py", line 179, in __runtest
test.runtest(self, url, tag, args, dargs)
File "/usr/local/autotest/client/bin/test.py", line 195, in runtest
fork_waitfor(job.resultdir, pid)
File "/usr/local/autotest/client/bin/parallel.py", line 40, in fork_waitfor
(pid, status) = os.waitpid(pid, 0)
KeyboardInterrupt
GOOD linus_stress completed successfully

I don't remember what was the next test.

I'll try to find out how to reproduce this lockup. Anyway, IMO it's
not a network related problem.

Regards,
Michal

--
Michal K. K. Piotrowski
Kernel Monkeys
(http://kernel.wikidot.com/start)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/