Repeated kernel oops from 2.2.{5,7} router

Roy Stogner (roystgnr@iname.com)
Fri, 7 May 1999 16:28:58 -0500 (CDT)


This is a serious, reproduceable problem. In short, I'm working from
behind a 486 Linux router which is configured to do IP Masqing, kernel
port forwarding, and lots of other neat things I'm thankful for.
Unfortunately, it can't seem to do all of them right now without
crashing. Although IP Masqing for HTTP connections seems to be
working fine, I can't keep an ssh connection to the outside world open
for more than a few minutes before the router crashes with an oops on
the screen. This happens with both 2.2.5 and 2.2.7.

Here's what I've got out of (my first time using) ksymoops:

[root@app OOPS-2.2.7]# ../ksymoops -v vmlinux -o 2.2.7/ -K -L -M <
oopsfile
Options used: -v vmlinux (specified)
-o 2.2.7/ (specified)
-K (specified)
-L (specified)
-M (specified)
-c 1 (default)

No modules in ksyms, skipping objects
current->tss.cr3 = 00101000, %cr3 = 00101000
*pde = 00000000
Oops: 0002
CPU: 0
EIP: 0010: [<c016570b>]
EFLAGS: 00010282
eax: c02bde3c ebx: 00000001 ecx: c02bde30 edx: 00000000
esi: c02bddc0 edi: c1fda000 ebp: c0200008 esp: c020bf28
ds: 0018 es: 0018 ss: 0018
Stack:
c01654d3 c02bddc0 c1fda000 c02bddc0 c01fffdc c016557e c1fda000
00000000
c02008bc c1f58030 c0160c91 00000001 c0245fe4 00000000 c020bf8c
00008a38
c01182f9 c020a000 c0216000 000015e5 c01108c3 c020a000 c020a000
000015e5
Call trace: [<c01654d3>][<c016557e>][<c0160c91>][<c01182f9>]
[<c01108c3>][<c01071f7>][<c010724f>]
[<c0106000>][<c01072a4>][<c0108be4>][<c0106000>]
[<c0106087>][<c0106000>][<c0100176>]
Code: 89 4a 04 89 11 c7 00 00 00 00 00 c7 40 04 00 00 00 00 c7 40

>>EIP: c016570b <pfifo_fast_dequeue+1b/50>
Trace: c01654d3 <qdisc_restart+13/90>
Trace: c016557e <qdisc_run_queues+2e/60>
Trace: c0160c91 <net_bh+241/280>
Trace: c01182f9 <do_bottom_half+49/70>
Trace: c01108c3 <schedule+83/290>
Trace: c0106000 <get_options+0/80>
Trace: c0106087 <cpu_idle+7/20>
Code: c016570b <pfifo_fast_dequeue+1b/50> 00000000 <_EIP>: <===
Code: c016570b <pfifo_fast_dequeue+1b/50> 0: 89 4a 04 movl %ecx,0x4(%edx) <===
Code: c016570e <pfifo_fast_dequeue+1e/50> 3: 89 11 movl %edx,(%ecx)
Code: c0165710 <pfifo_fast_dequeue+20/50> 5: c7 00 00 00 00 movl $0x0,(%eax)
Code: c0165715 <pfifo_fast_dequeue+25/50> a: 00
Code: c0165716 <pfifo_fast_dequeue+26/50> b: c7 40 04 00 00 movl $0x0,0x4(%eax)
Code: c016571b <pfifo_fast_dequeue+2b/50> 10: 00 00
Code: c016571d <pfifo_fast_dequeue+2d/50> 12: c7 40 00 00 00 movl $0x0,0x0(%eax)
Code: c0165722 <pfifo_fast_dequeue+32/50> 17: 00 00

Aiee, killing interrupt handler
Kernel panic: Attempted to kill the idle task!
In swapper task - not syncing

---

It looks like it's dying in the scheduler, but it's not just a scheduler problem - the system stays up for days on end (under almost no load, but still) when there isn't an open IPMasqed connection. I also can't explain why web browsing out through the router works but open ssh connections cause a crash.

I hope this is enough info to be useful - I can provide info from route, ipchains, or anything else upon request... and like I said, this is repeatable.

---
Roy Stogner
roystgnr@rice.edu

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/