Re: Openafs 1.3.78 and kernel 2.4.29 oopses , same for 2.4.30 andopenafs 1.3.82

From: Dimitris Zilaskos
Date: Mon May 09 2005 - 05:05:58 EST




Hello ,

I have just completed 44 hours of uptime with 2.4.30 and openafs 1.3.78 . Two oops occured at night , but the system did not freeze :

a)

ksymoops 2.4.11 on i686 2.4.30. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.30/ (default)
-m /usr/src/linux/System.map (default)

Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.

Warning (compare_maps): libafs-2.4.30.mp symbol kallsyms_address_to_symbol not found in /usr/local/lib/openafs/libafs-2.4.30.mp.o. Ignoring /usr/local/lib/openafs/libafs-2.4.30.mp.o entry
Warning (compare_maps): libafs-2.4.30.mp symbol kallsyms_symbol_to_address not found in /usr/local/lib/openafs/libafs-2.4.30.mp.o. Ignoring /usr/local/lib/openafs/libafs-2.4.30.mp.o entry
Warning (compare_maps): libafs-2.4.30.mp symbol sys_chdir not found in /usr/local/lib/openafs/libafs-2.4.30.mp.o. Ignoring /usr/local/lib/openafs/libafs-2.4.30.mp.o entry
Warning (compare_maps): libafs-2.4.30.mp symbol sys_exit not found in /usr/local/lib/openafs/libafs-2.4.30.mp.o. Ignoring /usr/local/lib/openafs/libafs-2.4.30.mp.o entry
Warning (compare_maps): libafs-2.4.30.mp symbol sys_ioctl not found in /usr/local/lib/openafs/libafs-2.4.30.mp.o. Ignoring /usr/local/lib/openafs/libafs-2.4.30.mp.o entry
Warning (compare_maps): libafs-2.4.30.mp symbol sys_open not found in /usr/local/lib/openafs/libafs-2.4.30.mp.o. Ignoring /usr/local/lib/openafs/libafs-2.4.30.mp.o entry
Warning (compare_maps): libafs-2.4.30.mp symbol sys_wait4 not found in /usr/local/lib/openafs/libafs-2.4.30.mp.o. Ignoring /usr/local/lib/openafs/libafs-2.4.30.mp.o entry
Warning (compare_maps): libafs-2.4.30.mp symbol sys_write not found in /usr/local/lib/openafs/libafs-2.4.30.mp.o. Ignoring /usr/local/lib/openafs/libafs-2.4.30.mp.o entry
May 8 04:03:23 system kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000039
May 8 04:03:23 system kernel: c014d763
May 8 04:03:23 system kernel: *pde = 00000000
May 8 04:03:23 system kernel: Oops: 0000
May 8 04:03:23 system kernel: CPU: 0
May 8 04:03:23 system kernel: EIP: 0010:[<c014d763>] Tainted: P
Using defaults from ksymoops -t elf32-i386 -a i386
May 8 04:03:23 system kernel: EFLAGS: 00010213
May 8 04:03:23 system kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000039
May 8 04:03:23 system kernel: c014d2c0
May 8 04:03:23 system kernel: *pde = 00000000
May 8 04:03:23 system kernel: eax: 00000001 ebx: 00000001 ecx: f0ab6664 edx: ee91bb40
May 8 04:03:23 system kernel: esi: e5559009 edi: 00000001 ebp: c9403f40 esp: c9403f28
May 8 04:03:23 system kernel: ds: 0018 es: 0018 ss: 0018
May 8 04:03:23 system kernel: Process rsync (pid: 25555, stackpage=c9403000)
May 8 04:03:23 system kernel: Stack: d66a1c40 c9403f40 00000008 00000008 c9403f98 00000001 e5559000 00000009
May 8 04:03:23 system kernel: 0fbf25c9 bfff7950 c9403f98 e5559000 00000000 00000008 c014de69 e5559000
May 8 04:03:23 system kernel: e5559000 c9403f98 c014e1c9 bfff7950 c0152fa0 c9402000 c9403f98 bfff8960
May 8 04:03:23 system kernel: Call Trace: [<c014de69>] [<c014e1c9>] [<c0152fa0>] [<c014a06f>] [<c0108ebb>]
May 8 04:03:23 system kernel: Code: 8b 7b 38 85 ff 0f 84 8e 00 00 00 f0 fe 0d e0 c9 41 c0 0f 88


EIP; c014d763 <link_path_walk+5c3/ac0> <=====

edx; ee91bb40 <_end+2e4b3700/3042bc20>
esi; e5559009 <_end+250f0bc9/3042bc20>
ebp; c9403f40 <_end+8f9bb00/3042bc20>
esp; c9403f28 <_end+8f9bae8/3042bc20>

Trace; c014de69 <path_lookup+39/40>
Trace; c014e1c9 <__user_walk+49/60>
Trace; c0152fa0 <filldir64+0/130>
Trace; c014a06f <sys_lstat64+1f/90>
Trace; c0108ebb <system_call+33/38>

Code; c014d763 <link_path_walk+5c3/ac0>
00000000 <_EIP>:
Code; c014d763 <link_path_walk+5c3/ac0> <=====
0: 8b 7b 38 mov 0x38(%ebx),%edi <=====
Code; c014d766 <link_path_walk+5c6/ac0>
3: 85 ff test %edi,%edi
Code; c014d768 <link_path_walk+5c8/ac0>
5: 0f 84 8e 00 00 00 je 99 <_EIP+0x99>
Code; c014d76e <link_path_walk+5ce/ac0>
b: f0 fe 0d e0 c9 41 c0 lock decb 0xc041c9e0
Code; c014d775 <link_path_walk+5d5/ac0>
12: 0f 88 00 00 00 00 js 18 <_EIP+0x18>


9 warnings issued. Results may not be reliable.


b)

ksymoops 2.4.11 on i686 2.4.30. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.30/ (default)
-m /usr/src/linux/System.map (default)

Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.

Warning (compare_maps): libafs-2.4.30.mp symbol kallsyms_address_to_symbol not found in /usr/local/lib/openafs/libafs-2.4.30.mp.o. Ignoring /usr/local/lib/openafs/libafs-2.4.30.mp.o entry
Warning (compare_maps): libafs-2.4.30.mp symbol kallsyms_symbol_to_address not found in /usr/local/lib/openafs/libafs-2.4.30.mp.o. Ignoring /usr/local/lib/openafs/libafs-2.4.30.mp.o entry
Warning (compare_maps): libafs-2.4.30.mp symbol sys_chdir not found in /usr/local/lib/openafs/libafs-2.4.30.mp.o. Ignoring /usr/local/lib/openafs/libafs-2.4.30.mp.o entry
Warning (compare_maps): libafs-2.4.30.mp symbol sys_exit not found in /usr/local/lib/openafs/libafs-2.4.30.mp.o. Ignoring /usr/local/lib/openafs/libafs-2.4.30.mp.o entry
Warning (compare_maps): libafs-2.4.30.mp symbol sys_ioctl not found in /usr/local/lib/openafs/libafs-2.4.30.mp.o. Ignoring /usr/local/lib/openafs/libafs-2.4.30.mp.o entry
Warning (compare_maps): libafs-2.4.30.mp symbol sys_open not found in /usr/local/lib/openafs/libafs-2.4.30.mp.o. Ignoring /usr/local/lib/openafs/libafs-2.4.30.mp.o entry
Warning (compare_maps): libafs-2.4.30.mp symbol sys_wait4 not found in /usr/local/lib/openafs/libafs-2.4.30.mp.o. Ignoring /usr/local/lib/openafs/libafs-2.4.30.mp.o entry
Warning (compare_maps): libafs-2.4.30.mp symbol sys_write not found in /usr/local/lib/openafs/libafs-2.4.30.mp.o. Ignoring /usr/local/lib/openafs/libafs-2.4.30.mp.o entry
May 8 04:03:23 system kernel: Oops: 0000
May 8 04:03:23 system kernel: CPU: 1
May 8 04:03:23 system kernel: EIP: 0010:[<c014d2c0>] Tainted: P
Using defaults from ksymoops -t elf32-i386 -a i386
May 8 04:03:23 system kernel: EFLAGS: 00010213
May 8 04:03:23 system kernel: eax: 00000001 ebx: 00000001 ecx: f0ad16d0 edx: ee91bb40
May 8 04:03:23 system kernel: esi: d06da04b edi: 00000001 ebp: c9ccdf40 esp: c9ccdf28
May 8 04:03:23 system kernel: ds: 0018 es: 0018 ss: 0018
May 8 04:03:23 system kernel: Process lftp (pid: 24957, stackpage=c9ccd000)
May 8 04:03:23 system kernel: Stack: d091c0e0 c9ccdf40 00000004 00000009 c9ccdf98 00000001 d06da047 00000003
May 8 04:03:23 system kernel: 0012238c 08390ea8 c9ccdf98 d06da000 00000000 00000009 c014de69 d06da000
May 8 04:03:23 system kernel: d06da000 c9ccdf98 c014e1c9 08390ea8 fffffffb c9ccc000 c9ccdf98 bffffb50
May 8 04:03:23 system kernel: Call Trace: [<c014de69>] [<c014e1c9>] [<c0149fdf>] [<c0108ebb>]
May 8 04:03:23 system kernel: Code: 8b 43 38 85 c0 0f 84 8e 00 00 00 f0 fe 0d e0 c9 41 c0 0f 88


EIP; c014d2c0 <link_path_walk+120/ac0> <=====

edx; ee91bb40 <_end+2e4b3700/3042bc20>
esi; d06da04b <_end+10271c0b/3042bc20>
ebp; c9ccdf40 <_end+9865b00/3042bc20>
esp; c9ccdf28 <_end+9865ae8/3042bc20>

Trace; c014de69 <path_lookup+39/40>
Trace; c014e1c9 <__user_walk+49/60>
Trace; c0149fdf <sys_stat64+1f/90>
Trace; c0108ebb <system_call+33/38>

Code; c014d2c0 <link_path_walk+120/ac0>
00000000 <_EIP>:
Code; c014d2c0 <link_path_walk+120/ac0> <=====
0: 8b 43 38 mov 0x38(%ebx),%eax <=====
Code; c014d2c3 <link_path_walk+123/ac0>
3: 85 c0 test %eax,%eax
Code; c014d2c5 <link_path_walk+125/ac0>
5: 0f 84 8e 00 00 00 je 99 <_EIP+0x99>
Code; c014d2cb <link_path_walk+12b/ac0>
b: f0 fe 0d e0 c9 41 c0 lock decb 0xc041c9e0
Code; c014d2d2 <link_path_walk+132/ac0>
12: 0f 88 00 00 00 00 js 18 <_EIP+0x18>


9 warnings issued. Results may not be reliable.


Again that happened at the time my afs server restarts . Just before the oops :

May 8 04:03:20 system kernel: afs: Lost contact with file server 1.2.3.4 in cell cell.gr (all multi-homed ip
addresses down for the server)
May 8 04:03:20 system kernel: afs: Lost contact with file server 1.2.3.4 in cell cell.gr (all multi-homed ip
addresses down for the server)
May 8 04:03:21 system kernel: afs: failed to store file (110)


Looks like identical behaviour as in 2.4.29 with 1.3.78. From what I have observed it seems that those oopses eventually will lead to all processes accesing AFS entering D state ,till the system freezes. Something in openafs 1.3.82 seems to accelarate this process. I am thinking of running a cron job that checks for D state processes and kills eveyrthing apart from the absolutetely essential processes if say more than 50 enter D state, in an attemp to prevent total system freeze , and give 2.4.30 and 1.3.82 another try. Unless you guys have a better suggestion:)


Regards ,

--
=============================================================================

Dimitris Zilaskos

Department of Physics @ Aristotle University of Thessaloniki , Greece
PGP key : http://tassadar.physics.auth.gr/~dzila/pgp_public_key.asc
http://egnatia.ee.auth.gr/~dzila/pgp_public_key.asc
MD5sum : de2bd8f73d545f0e4caf3096894ad83f pgp_public_key.asc
=============================================================================
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/