Kernel crash with 2.6.29 + nfs + xfs (radix-tree)

From: Alex Samad
Date: Tue May 19 2009 - 22:32:21 EST


Hi

I have been quit a lot of crashes on my debian amd64 box in the 2.6.29
series of kernel. Seems for me to be when the system is under load and
there is network action -> nfsd -> xfs.


May 5 19:45:38 x kernel: ------------[ cut here ]------------
May 5 19:45:39 x kernel: kernel BUG at lib/radix-tree.c:485!
May 5 19:45:39 x kernel: invalid opcode: 0000 [#1] SMP
May 5 19:45:39 x kernel: last sysfs file:
/sys/block/sdc/queue/nr_requests
May 5 19:45:39 x kernel: CPU 0
May 5 19:45:39 x kernel: Pid: 335, comm: kswapd0 Not tainted 2.6.29.2
#1 S2895
May 5 19:45:39 x kernel: RIP: 0010:[<ffffffff803916e0>]
[<ffffffff803916e0>] radix_tree_tag_set+0x86/0xc6
May 5 19:45:39 x kernel: RSP: 0018:ffff88016e2d1c88 EFLAGS: 00010246
May 5 19:45:39 x kernel: RAX: 0000000000000004 RBX: 0000000000000000
RCX: 0000000000000000
May 5 19:45:39 x kernel: RDX: 0000000000000000 RSI: 0000000000000000
RDI: ffff88016a822b58
May 5 19:45:39 x kernel: RBP: 0000000000000004 R08: 0000000000000000
R09: 8000000000000000
May 5 19:45:39 x kernel: R10: ffffa5a5a5a5a5a5 R11: ffffffff8037541d
R12: 0000000000000001
May 5 19:45:39 x kernel: R13: 0000000000000000 R14: ffff88016d1bc310
R15: 0000000000000000
May 5 19:45:39 x kernel: FS: 00007fea1903f6e0(0000)
GS:ffffffff80759040(0000) knlGS:0000000000000000
May 5 19:45:39 x kernel: CS: 0010 DS: 0018 ES: 0018 CR0:
000000008005003b
May 5 19:45:39 x kernel: CR2: 00007fd2df5ae8e0 CR3: 000000016bad0000
CR4: 00000000000006e0
May 5 19:45:39 x kernel: DR0: 0000000000000000 DR1: 0000000000000000
DR2: 0000000000000000
May 5 19:45:39 x kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0
DR7: 0000000000000400
May 5 19:45:39 x kernel: Process kswapd0 (pid: 335, threadinfo
ffff88016e2d0000, task ffff88016f23eac0)
May 5 19:45:39 x kernel: Stack:
May 5 19:45:39 x kernel: 000000000069d804 0000000000000000
ffff88016d1bc2d0 ffff88000a8b7400
May 5 19:45:39 x kernel: ffff88000a8b7400 ffff88016df30000
ffff88000a8b74f8 ffff88016d1bc30c
May 5 19:45:39 x kernel: ffffffff80376b02 ffff88000a8b7580
0000000000000024 ffff88016e2d1d60
May 5 19:45:39 x kernel: Call Trace:
May 5 19:45:39 x kernel: [<ffffffff80376b02>] ?
xfs_inode_set_reclaim_tag+0x69/0x89
May 5 19:45:39 x kernel: [<ffffffff8036972f>] ? xfs_reclaim+0x99/0x9f
May 5 19:45:39 x kernel: [<ffffffff80375453>] ?
xfs_fs_destroy_inode+0x36/0x54
May 5 19:45:39 x kernel: [<ffffffff80290304>] ? dispose_list+0xcd/0xfb
May 5 19:45:39 x kernel: [<ffffffff80290526>] ?
shrink_icache_memory+0x1f4/0x22a
May 5 19:45:39 x kernel: [<ffffffff8026242a>] ? shrink_slab+0xe4/0x157
May 5 19:45:39 x kernel: [<ffffffff80262b53>] ? kswapd+0x44f/0x5c9
May 5 19:45:39 x kernel: [<ffffffff8026063e>] ?
isolate_pages_global+0x0/0x231
May 5 19:45:39 x kernel: [<ffffffff8024458a>] ?
autoremove_wake_function+0x0/0x2e
May 5 19:45:39 x kernel: [<ffffffff8022a80e>] ?
__wake_up_common+0x44/0x73
May 5 19:45:39 x kernel: [<ffffffff80262704>] ? kswapd+0x0/0x5c9
May 5 19:45:39 x kernel: [<ffffffff80244266>] ? kthread+0x47/0x73
May 5 19:45:39 x kernel: [<ffffffff8020c4ba>] ? child_rip+0xa/0x20
May 5 19:45:39 x kernel: [<ffffffff8024421f>] ? kthread+0x0/0x73
May 5 19:45:39 x kernel: [<ffffffff8020c4b0>] ? child_rip+0x0/0x20
May 5 19:45:39 x kernel: Code: 83 e5 3f 89 ea e8 04 fc ff ff 85 c0 75
10 48 8b 54 24 08 48 8d 84 13 18 02 00 00 0f ab 28 48 63 c5 48 8b 5c c3
18 48 85 db 75 04 <0f> 0b eb fe 41 83 ed 06 41 ff cc 45$
May 5 19:45:39 x kernel: RIP [<ffffffff803916e0>]
radix_tree_tag_set+0x86/0xc6
May 5 19:45:39 x kernel: RSP <ffff88016e2d1c88>
May 5 19:45:39 x kernel: ---[ end trace aed81d6fef80e624 ]---


I have logged a bug with debian
( more info http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=526406),
there has been one other to report this problem.

we believe somebody has already reported a similar problem here
http://groups.google.com/group/linux.kernel/browse_thread/thread/dd00f52e93397c9e/6b6814dab9b41a05?pli=1

has any one else seen this problem, who do I need to raise this too ?

I am able to reproduce this problem on my machine (amd64 phenomem II 8G
ram), running virtualbox, I have a vm access the local filesystem via
nfs (udp) and when I do a rm -fr <some directory ~200M> I see the bug

I am moving the partition over to ext3 from xfs :(


Alex Samad
Please cc me as I am not subscribed to the mailing list

Attachment: signature.asc
Description: Digital signature