2.6.0-test1+bk: NFS client-side freezes

From: Alex Riesen (fork0@users.sourceforge.net)
Date: Sat Jul 19 2003 - 15:22:53 EST


Hi, all

I experienced BK freezing in D state while updating a repository on an
NFS volume.
I could see something being constantly transferred from the computer,
around 1Mb/s, by gkrellm judgement.
Also, BK reported corruptions in directories: it couldn't find some files.

I do not remember seeing them freezes before, but I have to admit I
never tried really hard (like running bk over nfs). There were certainly
none of them in 2.4, as I did run bk there over nfs.

The BK process unstuck after I pressed Alt-SysRq-T. That's what I got:

bk D CD245BD8 1346 1344 1345 (NOTLB)
cd245b74 00200086 d9fa7068 cd245bd8 00000000 00000000 00200202 cd245bd8
       00200246 cd244000 cd245bd8 cd245c50 cd245bb8 dc8d0998 cd245bd8 00000064
       00000004 cd244000 00000000 cd2478e0 c011a470 cd245c50 cd245c50 dbac34a0
Call Trace:
 [_end+475503432/1070231984] __rpc_execute+0x118/0x340 [sunrpc]
 [<dc8d0998>] __rpc_execute+0x118/0x340 [sunrpc]
 [default_wake_function+0/48] default_wake_function+0x0/0x30
 [<c011a470>] default_wake_function+0x0/0x30
 [_end+475486030/1070231984] rpc_call_sync+0x8e/0xd0 [sunrpc]
 [<dc8cc59e>] rpc_call_sync+0x8e/0xd0 [sunrpc]
 [_end+475486768/1070231984] call_reserveresult+0x0/0xf0 [sunrpc]
 [<dc8cc880>] call_reserveresult+0x0/0xf0 [sunrpc]
 [_end+475500272/1070231984] rpc_run_timer+0x0/0x80 [sunrpc]
 [<dc8cfd40>] rpc_run_timer+0x0/0x80 [sunrpc]
 [_end+475662022/1070231984] nfs3_rpc_wrapper+0x46/0x90 [nfs]
 [<dc8f7516>] nfs3_rpc_wrapper+0x46/0x90 [nfs]
 [_end+475662575/1070231984] nfs3_proc_getattr+0x5f/0xa0 [nfs]
 [<dc8f773f>] nfs3_proc_getattr+0x5f/0xa0 [nfs]
 [_end+475632436/1070231984] __nfs_revalidate_inode+0x154/0x320 [nfs]
 [<dc8f0184>] __nfs_revalidate_inode+0x154/0x320 [nfs]
 [_end+475656538/1070231984] nfs_updatepage+0x27a/0x300 [nfs]
 [<dc8f5faa>] nfs_updatepage+0x27a/0x300 [nfs]
 [dput+48/544] dput+0x30/0x220
 [<c0169810>] dput+0x30/0x220
 [_end+475614753/1070231984] nfs_lookup_revalidate+0x231/0x650 [nfs]
 [<dc8ebc71>] nfs_lookup_revalidate+0x231/0x650 [nfs]
 [_end+475615139/1070231984] nfs_lookup_revalidate+0x3b3/0x650 [nfs]
 [<dc8ebdf3>] nfs_lookup_revalidate+0x3b3/0x650 [nfs]
 [update_wall_time+22/64] update_wall_time+0x16/0x40
 [<c0125746>] update_wall_time+0x16/0x40
 [do_timer+224/240] do_timer+0xe0/0xf0
 [<c0125bb0>] do_timer+0xe0/0xf0
 [_end+475510727/1070231984] rpcauth_lookup_credcache+0x217/0x260 [sunrpc]
 [<dc8d2617>] rpcauth_lookup_credcache+0x217/0x260 [sunrpc]
 [_end+475510911/1070231984] rpcauth_lookupcred+0x6f/0xa0 [sunrpc]
 [<dc8d26cf>] rpcauth_lookupcred+0x6f/0xa0 [sunrpc]
 [_end+475510727/1070231984] rpcauth_lookup_credcache+0x217/0x260 [sunrpc]
 [<dc8d2617>] rpcauth_lookup_credcache+0x217/0x260 [sunrpc]
 [_end+475510911/1070231984] rpcauth_lookupcred+0x6f/0xa0 [sunrpc]
 [<dc8d26cf>] rpcauth_lookupcred+0x6f/0xa0 [sunrpc]
 [_end+475511279/1070231984] put_rpccred+0x1f/0xe0 [sunrpc]
 [<dc8d283f>] put_rpccred+0x1f/0xe0 [sunrpc]
 [cached_lookup+79/128] cached_lookup+0x4f/0x80
 [<c01601ef>] cached_lookup+0x4f/0x80
 [__lookup_hash+101/208] __lookup_hash+0x65/0xd0
 [<c0161235>] __lookup_hash+0x65/0xd0
 [open_namei+307/1072] open_namei+0x133/0x430
 [<c0161923>] open_namei+0x133/0x430
 [filp_open+65/112] filp_open+0x41/0x70
 [<c0151a41>] filp_open+0x41/0x70
 [sys_open+85/144] sys_open+0x55/0x90
 [<c0151f15>] sys_open+0x55/0x90
 [syscall_call+7/11] syscall_call+0x7/0xb
 [<c01094fb>] syscall_call+0x7/0xb

NFS: readdir reply truncated!

There was also frozen mutt, as I found later:

mutt D D8931B70 949 948 (NOTLB)
d8931b0c 00000082 d9fa7068 d8931b70 00000000 00000000 00000202 d8931b70
       00000246 d8930000 d8931b70 d8931be8 d8931b50 dc8d0998 d8931b70 00000064
       00000004 d8930000 00000000 db16e100 c011a470 d8931be8 d8931be8 dbac34a0
Call Trace:
 [_end+475503432/1070231984] __rpc_execute+0x118/0x340 [sunrpc]
 [<dc8d0998>] __rpc_execute+0x118/0x340 [sunrpc]
 [default_wake_function+0/48] default_wake_function+0x0/0x30
 [<c011a470>] default_wake_function+0x0/0x30
 [_end+475486030/1070231984] rpc_call_sync+0x8e/0xd0 [sunrpc]
 [<dc8cc59e>] rpc_call_sync+0x8e/0xd0 [sunrpc]
 [_end+475486768/1070231984] call_reserveresult+0x0/0xf0 [sunrpc]
 [<dc8cc880>] call_reserveresult+0x0/0xf0 [sunrpc]
 [_end+475500272/1070231984] rpc_run_timer+0x0/0x80 [sunrpc]
 [<dc8cfd40>] rpc_run_timer+0x0/0x80 [sunrpc]
 [_end+475662022/1070231984] nfs3_rpc_wrapper+0x46/0x90 [nfs]
 [<dc8f7516>] nfs3_rpc_wrapper+0x46/0x90 [nfs]
 [_end+475662575/1070231984] nfs3_proc_getattr+0x5f/0xa0 [nfs]
 [<dc8f773f>] nfs3_proc_getattr+0x5f/0xa0 [nfs]
 [_end+475632436/1070231984] __nfs_revalidate_inode+0x154/0x320 [nfs]
 [<dc8f0184>] __nfs_revalidate_inode+0x154/0x320 [nfs]
 [do_anonymous_page+280/512] do_anonymous_page+0x118/0x200
 [<c0144be8>] do_anonymous_page+0x118/0x200
 [handle_mm_fault+226/384] handle_mm_fault+0xe2/0x180
 [<c0145192>] handle_mm_fault+0xe2/0x180
 [apic_timer_interrupt+26/32] apic_timer_interrupt+0x1a/0x20
 [<c0109e8a>] apic_timer_interrupt+0x1a/0x20
 [csum_partial_copy_generic+154/252] csum_partial_copy_generic+0x9a/0xfc
 [<c01b008e>] csum_partial_copy_generic+0x9a/0xfc
 [skb_copy_and_csum_bits+102/672] skb_copy_and_csum_bits+0x66/0x2a0
 [<c0221276>] skb_copy_and_csum_bits+0x66/0x2a0
 [dput+48/544] dput+0x30/0x220
 [<c0169810>] dput+0x30/0x220
 [_end+475615076/1070231984] nfs_lookup_revalidate+0x374/0x650 [nfs]
 [<dc8ebdb4>] nfs_lookup_revalidate+0x374/0x650 [nfs]
 [default_wake_function+42/48] default_wake_function+0x2a/0x30
 [<c011a49a>] default_wake_function+0x2a/0x30
 [__wake_up_common+58/96] __wake_up_common+0x3a/0x60
 [<c011a4da>] __wake_up_common+0x3a/0x60
 [kill_fasync+46/80] kill_fasync+0x2e/0x50
 [<c0164a1e>] kill_fasync+0x2e/0x50
 [bh_lru_install+203/272] bh_lru_install+0xcb/0x110
 [<c01554db>] bh_lru_install+0xcb/0x110
 [_end+475510727/1070231984] rpcauth_lookup_credcache+0x217/0x260 [sunrpc]
 [<dc8d2617>] rpcauth_lookup_credcache+0x217/0x260 [sunrpc]
 [_end+475510911/1070231984] rpcauth_lookupcred+0x6f/0xa0 [sunrpc]
 [<dc8d26cf>] rpcauth_lookupcred+0x6f/0xa0 [sunrpc]
 [do_lookup+89/160] do_lookup+0x59/0xa0
 [<c0160559>] do_lookup+0x59/0xa0
 [link_path_walk+308/2320] link_path_walk+0x134/0x910
 [<c01606d4>] link_path_walk+0x134/0x910
 [kmem_cache_alloc+326/400] kmem_cache_alloc+0x146/0x190
 [<c013e136>] kmem_cache_alloc+0x146/0x190
 [__user_walk+64/96] __user_walk+0x40/0x60
 [<c0161380>] __user_walk+0x40/0x60
 [vfs_stat+30/96] vfs_stat+0x1e/0x60
 [<c015c32e>] vfs_stat+0x1e/0x60
 [sys_stat64+27/64] sys_stat64+0x1b/0x40
 [<c015c9cb>] sys_stat64+0x1b/0x40
 [sys_swapon+1536/2064] sys_swapon+0x600/0x810
 [<c0150000>] sys_swapon+0x600/0x810
 [syscall_call+7/11] syscall_call+0x7/0xb
 [<c01094fb>] syscall_call+0x7/0xb

The system is UP P3-700, preempt is on, and compiled with gcc-3.2.2 from
Gentoo. +bk patches are from July 19, and there was also some patches
from -mm tree (Con's scheduler work). I reverted the ones in the nfs
area, trying to find the problem, but no luck.
The server is plain 2.4.20.

$ lsmod
Module Size Used by
nfs 92972 1
nfsd 93264 2 [unsafe]
exportfs 5440 1 nfsd
lockd 58768 3 nfs,nfsd,[unsafe]
sunrpc 120004 7 nfs,nfsd,lockd
rtc 10724 0
ide_cd 37984 0
cdrom 33248 1 ide_cd
mousedev 6964 1
hid 23328 0
tulip 54272 0
crc32 3840 1 tulip
uhci_hcd 29704 0
usbcore 97940 4 hid,uhci_hcd

There were also some mentions about old mount:
nfs warning: mount version older than kernel
Mine is not quite that old: mount-2.11z.

.config attached.

-alex



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Jul 23 2003 - 22:00:38 EST