Kernel crash: networking and SMP

Ryurick M. Hristev (physrmh@phys.canterbury.ac.nz)
Fri, 08 Oct 1999 17:15:29 +1300 (NZDT)


-----BEGIN PGP SIGNED MESSAGE-----

Hello,

Sorry I can't be more specific about the problem.
We've experienced quite a number of crashes under SMP on Intel x86.
We suspect the problem have to do with networking and SMP
triggered by heavy load. Below is the bug report to the best we could
manage. Any help much appreciated.

TIA

- ----------------------------------------------------------------------
BUG REPORT

1. SUMMARY:
Kernel 2.2.12 crash under heavy load, NULL pointer dereference

2. DESCRIPTION:
Kernel 2.2.12 on a dual pentium pro/II dies
tryinig to dereference a null pointer, also:
"Kernel panic: Attempted to kill the idle task!"

The oops tracing indicates a possible networking relation.
This make sense to us as from 3 dual pentiums pro/II machines
heavely loaded we've experienced the crashes only with two,
related to heavy network traffic (the third one used for heavy
number crunching had no problems).

The problem seems to be related to SMP as well as for
the mackines are very stable under the uniprocessor compiled kernel.

3. KEYWORDS:
networking, SMP

4. KERNEL VERSION:
2.2.12

5. OUTPUT OF OOPS:

first version:
- ---------------------------------------------------------------------
ksymoops 0.7c on i686 2.2.5-22. Options used
-v /usr/src/linux/vmlinux (specified)
-K (specified)
-L (specified)
-o /lib/modules/2.2.12 (specified)
-m /usr/src/linux/System.map (specified)
-e

No modules in ksyms, skipping objects
Unable to handle kernel NULL pointer dereference at virtual address 0000001f
current->tss.cr3 = 00101000, %cr3 = 00101000
*pde = 00000000
Oops: 0000
CPU: 1
EIP: 0010:[<c01327>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010212
eax: 00005f7f ebx: 0000001f ecx: c6c3200 edx: 00000000
esi: 015b572 edi: 0000000 ebp: 015b575 esp: c0235edc
ds: 001 es: 001 ss: 001
Process swapper (pid: 0, process nr: 0, stackpage=c0235000)
Stack: ceba4c00 0000001d c0161663 ca44691c 00000002 c9f9f20 c0151ae3 cf9ab00
c0152476 c9f9f20 c64da30 0000000 d04bd c9f9f20 c692a320 04000001
00000004 0000000a c90b94a0 c6b65ba0 00000034 c0235f5c c90b94a0 00000020
Call Trace: [<c0161663>] [<c0151ae3>] [<c0152476>] [<d04bd>] [<c010c2fd>] [<
[<c010af4>] [<c0107d1>] [<c0106000>] [<c0106000>] [<c01001b1>]
Code: 1 3b 01 46 00 00 74 10 6 60 ad 1e c0 e be 45 fe ff 3 c4
Error (Oops_code_values): invalid value 0x1 in Code line, must be 2, 4, 8 or 16 digits, value ignored
Error (Oops_code_values): invalid value 0x6 in Code line, must be 2, 4, 8 or 16 digits, value ignored
Error (Oops_code_values): invalid value 0xe in Code line, must be 2, 4, 8 or 16 digits, value ignored
Error (Oops_code_values): invalid value 0x3 in Code line, must be 2, 4, 8 or 16 digits, value ignored

>>EIP; 00c01327 Before first symbol <=====
Trace; c0161663 <tcp_write_space+4f/54>
Trace; c0151ae3 <sock_wfree+17/1c>
Trace; c0152476 <__kfree_skb+36/a8>
Trace; 000d04bd Before first symbol
Trace; c010c2fd <handle_IRQ_event+55/88>
Trace; 0c010af4 Before first symbol
Trace; 0c0107d1 Before first symbol
Trace; c0106000 <get_options+0/74>
Trace; c0106000 <get_options+0/74>
Trace; c01001b1 <L6+0/2>
Code; 00c01327 Before first symbol
00000000 <_EIP>:
Code; 00c01327 Before first symbol <=====
0: 3b 01 cmpl (%ecx),%eax <=====
Code; 00c01329 Before first symbol
2: 46 incl %esi
Code; 00c0132a Before first symbol
3: 00 00 addb %al,(%eax)
Code; 00c0132c Before first symbol
5: 74 10 je 17 <_EIP+0x17> 00c0133e Before first symbol
Code; 00c0132e Before first symbol
7: 60 pushal
Code; 00c0132f Before first symbol
8: ad lodsl %ds:(%esi),%eax
Code; 00c01330 Before first symbol
9: 1e pushl %ds
Code; 00c01331 Before first symbol
a: c0 be 45 fe ff c4 00 sarb $0x0,0xc4fffe45(%esi)

Aiee, killing interrupt handler
Kernel panic: Attempted to kill the idle task!
In swapper task - not syncing

4 errors issued. Results may not be reliable.
- ---------------------------------------------------------------------
second version: (teh 4 digits from "Code:" padded with leading zeros)
- ---------------------------------------------------------------------
ksymoops 0.7c on i686 2.2.5-22. Options used
-v /usr/src/linux/vmlinux (specified)
-K (specified)
-L (specified)
-o /lib/modules/2.2.12 (specified)
-m /usr/src/linux/System.map (specified)
-e

No modules in ksyms, skipping objects
Unable to handle kernel NULL pointer dereference at virtual address 0000001f
current->tss.cr3 = 00101000, %cr3 = 00101000
*pde = 00000000
Oops: 0000
CPU: 1
EIP: 0010:[<c01327>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010212
eax: 00005f7f ebx: 0000001f ecx: c6c3200 edx: 00000000
esi: 015b572 edi: 0000000 ebp: 015b575 esp: c0235edc
ds: 001 es: 001 ss: 001
Process swapper (pid: 0, process nr: 0, stackpage=c0235000)
Stack: ceba4c00 0000001d c0161663 ca44691c 00000002 c9f9f20 c0151ae3 cf9ab00
c0152476 c9f9f20 c64da30 0000000 d04bd c9f9f20 c692a320 04000001
00000004 0000000a c90b94a0 c6b65ba0 00000034 c0235f5c c90b94a0 00000020
Call Trace: [<c0161663>] [<c0151ae3>] [<c0152476>] [<d04bd>] [<c010c2fd>] [<
[<c010af4>] [<c0107d1>] [<c0106000>] [<c0106000>] [<c01001b1>]
Code: 01 3b 01 46 00 00 74 10 06 60 ad 1e c0 0e be 45 fe ff 03 c4

>>EIP; 00c01327 Before first symbol <=====
Trace; c0161663 <tcp_write_space+4f/54>
Trace; c0151ae3 <sock_wfree+17/1c>
Trace; c0152476 <__kfree_skb+36/a8>
Trace; 000d04bd Before first symbol
Trace; c010c2fd <handle_IRQ_event+55/88>
Trace; 0c010af4 Before first symbol
Trace; 0c0107d1 Before first symbol
Trace; c0106000 <get_options+0/74>
Trace; c0106000 <get_options+0/74>
Trace; c01001b1 <L6+0/2>
Code; 00c01327 Before first symbol
00000000 <_EIP>:
Code; 00c01327 Before first symbol <=====
0: 01 3b addl %edi,(%ebx) <=====
Code; 00c01329 Before first symbol
2: 01 46 00 addl %eax,0x0(%esi)
Code; 00c0132c Before first symbol
5: 00 74 10 06 addb %dh,0x6(%eax,%edx,1)
Code; 00c01330 Before first symbol
9: 60 pushal
Code; 00c01331 Before first symbol
a: ad lodsl %ds:(%esi),%eax
Code; 00c01332 Before first symbol
b: 1e pushl %ds
Code; 00c01333 Before first symbol
c: c0 0e be rorb $0xbe,(%esi)
Code; 00c01336 Before first symbol
f: 45 incl %ebp
Code; 00c01337 Before first symbol
10: fe (bad)
Code; 00c01338 Before first symbol
11: ff 03 incl (%ebx)
Code; 00c0133a Before first symbol
13: c4 00 lesl (%eax),%eax

Aiee, killing interrupt handler
Kernel panic: Attempted to kill the idle task!
In swapper task - not syncing
- ---------------------------------------------------------------------
Notes: we've switched to a uniprocessor machine after the last crash

6. REPLICATION:

IMHO not exactly possible:
sometimes it may go for 7-10 days, sometimes may have several crashes
a day.

The code sequence is always the same.

7. ENVIRONMENT:

7.1 ver_linux: NOTE: this was run under 2.2.5-uniprocessor

- -- Versions installed: (if some fields are empty or looks
- -- unusual then possibly you have very old versions)
Linux yukawa 2.2.5-22 #1 Wed Jun 2 08:45:51 EDT 1999 i686 unknown
Kernel modules 2.1.121
Gnu C egcs-2.91.66
Binutils 2.9.1.0.23
Linux C Library 2.1.1
Dynamic linker ldd (GNU libc) 2.1.1
Procps 2.0.2
Mount 2.9o
Net-tools 1.52
Console-tools 1999.03.02
Sh-utils 1.16
Modules Loaded nfsd nfs lockd sunrpc 3c59x aic7xxx

7.2 CPU info
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 1
model name : Pentium Pro
stepping : 9
cpu MHz : 199.434997
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
sep_bug : no
f00f_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
bogomips : 199.07

it also happened on the following:
(but we do not have the oops, we just _suspect_ it was the same bug)

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 3
model name : Pentium II (Klamath)
stepping : 3
cpu MHz : 233.868786
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
sep_bug : no
f00f_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov mmx
bogomips : 233.47

7.3 Modules
nfsd
nfs
lockd
sunrpc
3c59x
aic7xxx

7.4 SCSI

Attached devices:
Host: scsi0 Channel: 00 Id: 01 Lun: 00
Vendor: IBM Model: DORS-32160 Rev: WA6A
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 02 Lun: 00
Vendor: IBM OEM Model: DCHS04U Rev: 2626
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 04 Lun: 00
Vendor: IBM OEM Model: DCHS04U Rev: 2626
Type: Direct-Access ANSI SCSI revision: 02

Cheers,
______________________________________________________________________
Ryurick M. Hristev ()..()/^\/^\ -<:-)
R.Hristev@phys.canterbury.ac.nz \/ \#/\#/\) What opinions ?
______________________________________________________________________

-----BEGIN PGP SIGNATURE-----
Version: 2.6.3ia
Charset: noconv

iQEVAwUBN/1v8zmhfXaEzMQNAQEd4QgArZqy+1WSTk0LQKuEhn2TwY9gVOETnfQB
95rBF8iaFWRL3iAXnUIOq5k/5I0+iwrGLMlKH9AqvdYJsuASnaHsnHrFkDgO4GtJ
3l3FklFOiHY99cwrFmAzfn7gCGLjayZR6wGG4JUPgCsRzN31ENL9OBvA/IG6S++A
ddHzTqTgSX8R4d5r/XjMppJnUbt3cp5AQFYbE5+ppN0gzPHXJtz1PUEt+toNjZ1W
z6OS1JFxvOZyD6a4bPTqk9l89QxSAPyv8gdeYd9SYr3R4KCqnDIfTeDfHO+0Ee5Y
FhZRkaRMJ+GwqP5Pc4dcsrTgMz5R3zF++U2pjKi17mnWAmpmr9CSqA==
=LOMH
-----END PGP SIGNATURE-----

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/