I already reported problems with our ISP Server machine running 1.99.14 to
vger. After examining the oops messages i decided to put the following
in /usr/src/linux/include/linux/skbuf.h:
-
#define CONFIG_SKB_CHECK 1
#define PARANOID_BUGHUNT_MODE 1
-
Some hours after rebooting into this kernel, the machine crashed again with
the following messages. Please have a look at the skbuf messages:
-- Jun 11 03:40:53 lilly kernel: Unable to handle kernel paging request at virtual address c90000ec Jun 11 03:40:53 lilly kernel: current->tss.cr3 = 03129000, Pr3 = 03129000 Jun 11 03:40:53 lilly kernel: *pde = 00000000 Jun 11 03:40:53 lilly kernel: Oops: 0000 Jun 11 03:40:53 lilly kernel: CPU: 0 Jun 11 03:40:53 lilly kernel: EIP: 0010:[<00136e10>] Jun 11 03:40:53 lilly kernel: EFLAGS: 00010206 Jun 11 03:40:53 lilly kernel: eax: 00ff5ce4 ebx: 00ff5ce4 ecx: 00ff5cdc edx: 00000001 Jun 11 03:40:53 lilly kernel: esi: 090000e0 edi: 090000e0 ebp: 0000005c esp: 01f30e80 Jun 11 03:40:53 lilly kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018 Jun 11 03:40:53 lilly kernel: Process gated (pid: 116, process nr: 9, stackpage=01f30000) Jun 11 03:40:53 lilly kernel: Stack: 03119a18 090000e0 090000e0 01f30eac 001b5890 0014c44e 000000f4 00000001 Jun 11 03:40:53 lilly kernel: 03119a18 090000e0 001c04a0 090000e0 0014c892 001c04a0 090000e0 00000016 Jun 11 03:40:53 lilly kernel: 00000013 00ff5de0 001c04a0 0014ca98 001c04a0 090000e0 033e0018 00000000 Jun 11 03:40:53 lilly kernel: Call Trace: [<0014c44e>] [<0014c892>] [<0014ca98> ] [<00140f04>] [<0014add4>] [<001353ff>] [<00135a88>] Jun 11 03:40:53 lilly kernel: [<0010a3b2>] Jun 11 03:40:53 lilly kernel: Code: 81 7e 0c d1 de c0 de 75 0e 56 68 f2 34 1a 00 e8 94 b2 fd ff Jun 11 03:40:55 lilly kernel: fcntl_setlk() called by process 140 (sendmail) with broken flock() emulation Jun 11 03:40:55 lilly last message repeated 2 times Jun 11 03:40:59 lilly kernel: File: skbuff.c Line 398, passed a non skb! Jun 11 03:40:59 lilly kernel: skb=0358ecb4, real size=0, free=0 Jun 11 03:40:59 lilly kernel: File: skbuff.c Line 400, passed a non skb! Jun 11 03:40:59 lilly kernel: skb=0358ecb4, real size=0, free=0 Jun 11 03:41:02 lilly kernel: Adding Swap: 51196k swap-space Jun 11 03:41:02 lilly kernel: File: skbuff.c Line 398, passed a non skb! Jun 11 03:41:02 lilly kernel: skb=0358ecb4, real size=0, free=0 Jun 11 03:41:02 lilly kernel: File: skbuff.c Line 400, passed a non skb! Jun 11 03:41:02 lilly kernel: skb=0358ecb4, real size=0, free=0 Jun 11 03:41:11 lilly kernel: File: skbuff.c Line 585, control overrun Jun 11 03:41:11 lilly kernel: skb=00ff5be8, end=03f4f637 Jun 11 03:41:30 lilly kernel: File: skbuff.c Line 585, control overrun Jun 11 03:41:30 lilly kernel: skb=00ff5be8, end=03e2ce2f Jun 11 03:41:50 lilly kernel: File: skbuff.c Line 479, bad next skb member Jun 11 03:41:50 lilly kernel: skb_unlink: not a linked element Jun 11 03:41:50 lilly kernel: double lock on device queue, lock=97 caller=00137d83 Jun 11 03:41:50 lilly kernel: File: dev.c Line 346, bad next skb member Jun 11 03:41:50 lilly kernel: general protection: 0000 Jun 11 03:41:50 lilly kernel: CPU: 0 Jun 11 03:41:50 lilly kernel: EIP: 0010:[<00000000>] Jun 11 03:41:50 lilly kernel: EFLAGS: 00010202 Jun 11 03:41:50 lilly kernel: eax: ffffffff ebx: 00000001 ecx: 0004000d edx: 00000000 Jun 11 03:41:50 lilly kernel: esi: 00603b00 edi: 00603a68 ebp: 00603b04 esp: 033fcf30 Jun 11 03:41:50 lilly kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018 Jun 11 03:41:50 lilly kernel: Process sendmail (pid: 140, process nr: 26, stackpage=033fc000) Jun 11 03:41:50 lilly kernel: Stack: 00137b41 00603b00 00603a68 00603b00 00603b04 00000001 fffffffe 00000001 Jun 11 03:41:50 lilly kernel: 00000212 00137d94 00603b00 00603a68 fffffffe 00603b00 00603b00 00603b04 Jun 11 03:41:50 lilly kernel: 00603a68 ffffff80 00000080 001d9480 00000212 00137c47 00603a68 00000080 Jun 11 03:41:50 lilly kernel: Call Trace: [<00137b41>] [<00137d94>] [<00137c47> ] [<00137c5d>] [<0011618b>] [<0010a33b>] Jun 11 03:41:50 lilly kernel: Code: 01 00 00 00 6f ef 00 f0 c3 e2 00 f0 6f ef 00 f0 6f ef 00 f0 Jun 11 03:41:50 lilly kernel: Aiee, killing interrupt handler Jun 11 03:41:51 lilly kernel: File: skbuff.c Line 479, bad next skb member Jun 11 03:41:51 lilly kernel: skb_unlink: not a linked element Jun 11 03:41:51 lilly kernel: double lock on device queue, lock=98 caller=00137d83 Jun 11 03:41:51 lilly kernel: File: dev.c Line 346, bad next skb member Jun 11 03:41:51 lilly kernel: general protection: 0000 Jun 11 03:41:51 lilly kernel: CPU: 0 Jun 11 03:41:51 lilly kernel: EIP: 0010:[<00000000>] Jun 11 03:41:51 lilly kernel: EFLAGS: 00010202 Jun 11 03:41:51 lilly kernel: eax: ffffffff ebx: 00000001 ecx: 0004000d edx: 00000000 Jun 11 03:41:51 lilly kernel: esi: 00603b00 edi: 00603a68 ebp: 00603b04 esp: 03f9df30 Jun 11 03:41:51 lilly kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018 Jun 11 03:41:51 lilly kernel: Process uucico (pid: 198, process nr: 51, stackpage=03f9d000) Jun 11 03:41:51 lilly kernel: Stack: 00137b41 00603b00 00603a68 00603b00 00603b04 00000001 fffffffe 00000001 Jun 11 03:41:51 lilly kernel: 00000212 00137d94 00603b00 00603a68 fffffffe 00603b00 00603b00 00603b04 Jun 11 03:41:51 lilly kernel: 00603a68 ffffff80 00000080 001d9480 00000212 00137c47 00603a68 00000080 Jun 11 03:41:51 lilly kernel: Call Trace: [<00137b41>] [<00137d94>] [<00137c47> ] [<00137c5d>] [<0011618b>] [<0010a33b>] Jun 11 03:41:51 lilly kernel: Code: 01 00 00 00 6f ef 00 f0 c3 e2 00 f0 6f ef 00 f0 6f ef 00 f0 Jun 11 03:41:51 lilly kernel: Aiee, killing interrupt handler Jun 11 03:41:51 lilly kernel: File: skbuff.c Line 479, bad next skb member Jun 11 03:41:51 lilly kernel: skb_unlink: not a linked element Jun 11 03:41:51 lilly kernel: double lock on device queue, lock=99 caller=00137d83 Jun 11 03:41:51 lilly kernel: File: dev.c Line 346, bad next skb member Jun 11 03:52:46 lilly kernel: klogd 1.3-0, log source = /proc/kmsg started.-- The Ksymoops output to the oops in order of appearance: I guess the first output is relevant, all others may result from the first Problem. - Using `./System.map' to map addresses to symbols.>>EIP: 136e10 <alloc_skb+80/16c> Trace: 14c44e <igmp_send_report+12/b0> Trace: 14c892 <ip_mc_inc_group+86/90> Trace: 14ca98 <ip_mc_join_group+f8/128> Trace: 140f04 <ip_setsockopt+494/5b0> Trace: 14add4 <inet_setsockopt+48/5c> Trace: 1353ff <sys_setsockopt+63/78> Trace: 135a88 <sys_socketcall+270/2dc> Trace: 10a3b2 <system_call+52/80>
Code: 136e10 <alloc_skb+80/16c> cmpl $0xdec0ded1,0xc(%esi) Code: 136e17 <alloc_skb+87/16c> jne 136e27 <alloc_skb+97/16c> Code: 136e19 <alloc_skb+89/16c> pushl %esi Code: 136e1a <alloc_skb+8a/16c> pushl $0x1a34f2 Code: 136e1f <alloc_skb+8f/16c> call fffdb2a8 <_EIP+fffdb2a8> - Using `./System.map' to map addresses to symbols.
Trace: 137b41 <do_dev_queue_xmit+169/190> Trace: 137d94 <dev_tint+44/6c> Trace: 137c47 <dev_transmit+1f/2c> Trace: 137c5d <net_bh+9/fc> Trace: 11618b <do_bottom_half+3b/60> Trace: 10a33b <handle_bottom_half+b/20>
Code: addl %eax,(%eax) Code: addb %al,(%eax) Code: outsl %ds:(%esi),(%dx) Code: outl %eax,(%dx) Code: addb %dh,%al Code: ret Code: loop 0000000b <_EIP+b> Code: lock outsl %ds:(%esi),(%dx) Code: outl %eax,(%dx) Code: addb %dh,%al Code: outsl %ds:(%esi),(%dx) Code: outl %eax,(%dx) Code: addb %dh,%al
- Using `./System.map' to map addresses to symbols.
Trace: 137b41 <do_dev_queue_xmit+169/190> Trace: 137d94 <dev_tint+44/6c> Trace: 137c47 <dev_transmit+1f/2c> Trace: 137c5d <net_bh+9/fc> Trace: 11618b <do_bottom_half+3b/60> Trace: 10a33b <handle_bottom_half+b/20>
Code: addl %eax,(%eax) Code: addb %al,(%eax) Code: outsl %ds:(%esi),(%dx) Code: outl %eax,(%dx) Code: addb %dh,%al Code: ret Code: loop 0000000b <_EIP+b> Code: lock outsl %ds:(%esi),(%dx) Code: outl %eax,(%dx) Code: addb %dh,%al Code: outsl %ds:(%esi),(%dx) Code: outl %eax,(%dx) Code: addb %dh,%al - I hope this information is useful to you to catch the bug.
Regards,
-Michael
-- x(f,s,c)char *s;{return f&1 ? *s ? *s-c ? x(f,++s,c) :7[s]:0:f&2 ? x(--f,"!/*,xq-ih9]c$=le&M t)r\nm@p31n%ag.8}Sdoy",c):f&4 ? *s ? x(f,s+1,putchar(x(f-2,"^&%!*)",*s))) : 0 : 0;}main(){return x(4, "]!x/mhicn$!iihle&!x/mhiM$agimr%p !r@p%he&!x/mhiM !r@p%he",65);}