2.6.19 nfsd oops

From: Jeff Garzik
Date: Fri Dec 08 2006 - 07:27:24 EST



After running fine for 8 days, my NFS fileserver suddenly oops'd in the NFS (see first attachment).

Data points:
* system: Intel x86-64, Fedora Core 5
* kernel: unpatched 2.6.19 release kernel
* from the syslog, it appears that the event coincided with a Fedora Core 5 nfs-utils package upgrade
* the oops appears to be workqueue-related, but that might just be the callsite
* I recently started (again) switching clients from NFSv3 to NFSv4. This may or may not be related to the oops.

I've attached the dmesg and .config from 2.6.19-gbff19b1d, which is the kernel it is currently running. This is just to give a better picture of the machine. As I noted above, the oops was from kernel 2.6.19 not 2.6.19-gbff19b1d.

Jeff


Dec 8 05:23:32 pretzel yum: Updated: nfs-utils.x86_64 1:1.0.8-4.fc5
Dec 8 05:23:32 pretzel mountd[2041]: Caught signal 15, un-registering and exiting.
Dec 8 05:23:37 pretzel exportfs[26021]: /etc/exports [2]: No 'sync' or 'async' option specified for export "10.10.10.0/255.255.255.0:/g". Assuming default behaviour ('sync'). NOTE: this default has changed from previous versions
Dec 8 05:23:37 pretzel nfsd[26023]: nfssvc_versbits: +2 +3 +4
Dec 8 05:23:37 pretzel kernel: Unable to handle kernel paging request at 0000040000000160 RIP:
Dec 8 05:23:37 pretzel kernel: [<ffffffff802860d3>] flush_cpu_workqueue+0x12/0x9e
Dec 8 05:23:37 pretzel kernel: PGD 0
Dec 8 05:23:37 pretzel kernel: Oops: 0000 [1] SMP
Dec 8 05:23:37 pretzel kernel: CPU 0
Dec 8 05:23:37 pretzel kernel: Modules linked in: nfsd exportfs lockd nfs_acl sunrpc ipv6 usblp dm_mirror dm_mod container button ac lp parport_pc parport floppy nvram sg ata_generic ehci_hcd uhci_hcd ata_piix tg3 ext3 jbd ahci libata sd_mod scsi_mod
Dec 8 05:23:37 pretzel kernel: Pid: 26023, comm: rpc.nfsd Not tainted 2.6.19 #1
Dec 8 05:23:37 pretzel kernel: RIP: 0010:[<ffffffff802860d3>] [<ffffffff802860d3>] flush_cpu_workqueue+0x12/0x9e
Dec 8 05:23:37 pretzel kernel: RSP: 0000:ffff8100653efd98 EFLAGS: 00010292
Dec 8 05:23:37 pretzel kernel: RAX: ffff81003ec43900 RBX: 0000040000000100 RCX: ffff8100653ee000
Dec 8 05:23:37 pretzel kernel: RDX: ffff8100257f48c0 RSI: 0000000000000282 RDI: 0000040000000100
Dec 8 05:23:37 pretzel kernel: RBP: ffff8100163e9a00 R08: ffff8100653ee000 R09: ffff810037fef840
Dec 8 05:23:37 pretzel kernel: R10: ffff81007fe888a0 R11: ffff8100163e9a00 R12: 0000000000000801
Dec 8 05:23:37 pretzel kernel: R13: ffff8100653eff50 R14: 0000000000002000 R15: 0000000000000001
Dec 8 05:23:37 pretzel kernel: FS: 00002b89780966f0(0000) GS:ffffffff80523000(0000) knlGS:0000000000000000
Dec 8 05:23:37 pretzel kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Dec 8 05:23:37 pretzel kernel: CR2: 0000040000000160 CR3: 00000000324df000 CR4: 00000000000006e0
Dec 8 05:23:37 pretzel kernel: Process rpc.nfsd (pid: 26023, threadinfo ffff8100653ee000, task ffff8100257f48c0)
Dec 8 05:23:37 pretzel kernel: Stack: 0000000000000282 ffffffff802451d7 0000000000000000 0000000000200200
Dec 8 05:23:37 pretzel kernel: 0000000000000282 ffffffff881be0f0 ffff8100163e9a00 ffffffff881be0c0
Dec 8 05:23:37 pretzel kernel: ffff8100163e9a00 ffffffff80286208 0000000000000282 ffff81007bdba3d8
Dec 8 05:23:37 pretzel kernel: Call Trace:
Dec 8 05:23:38 pretzel kernel: [<ffffffff80286208>] cancel_rearming_delayed_workqueue+0x2a/0x2c
Dec 8 05:23:38 pretzel kernel: [<ffffffff881a6269>] :nfsd:nfs4_state_shutdown+0x19/0x1ab
Dec 8 05:23:38 pretzel kernel: [<ffffffff8818f616>] :nfsd:nfsd_last_thread+0x45/0x76
Dec 8 05:23:38 pretzel kernel: [<ffffffff881552c9>] :sunrpc:svc_destroy+0x82/0xc4
Dec 8 05:23:38 pretzel kernel: [<ffffffff8818f5b1>] :nfsd:nfsd_svc+0xf0/0x110
Dec 8 05:23:38 pretzel kernel: [<ffffffff88190069>] :nfsd:write_threads+0x6f/0xa9
Dec 8 05:23:38 pretzel kernel: [<ffffffff8818fdb7>] :nfsd:nfsctl_transaction_write+0x42/0x77
Dec 8 05:23:38 pretzel kernel: [<ffffffff802152e8>] vfs_write+0xad/0x153
Dec 8 05:23:38 pretzel kernel: [<ffffffff80215c1c>] sys_write+0x45/0x6e
Dec 8 05:23:38 pretzel kernel: [<ffffffff80257dee>] system_call+0x7e/0x83
Dec 8 05:23:38 pretzel kernel: [<00002b8977f0c1d0>]
Dec 8 05:23:38 pretzel kernel:
Dec 8 05:23:38 pretzel kernel:
Dec 8 05:23:38 pretzel kernel: Code: 48 39 57 60 75 07 e8 16 13 fc ff eb 78 48 89 e7 fc b9 0a 00
Dec 8 05:23:38 pretzel kernel: RIP [<ffffffff802860d3>] flush_cpu_workqueue+0x12/0x9e
Dec 8 05:23:38 pretzel kernel: RSP <ffff8100653efd98>
Dec 8 05:23:38 pretzel kernel: CR2: 0000040000000160
Dec 8 05:23:38 pretzel kernel: <4>nfsd: last server has exited
Dec 8 05:23:38 pretzel kernel: nfsd: unexporting all filesystems
Dec 8 06:03:56 pretzel kernel: ip_tables: (C) 2000-2006 Netfilter Core Team
Dec 8 06:04:00 pretzel kernel: Netfilter messages via NETLINK v0.30.
Dec 8 06:04:00 pretzel kernel: ip_conntrack version 2.4 (8192 buckets, 65536 max) - 288 bytes per conntrack
Dec 8 06:06:33 pretzel shutdown[9656]: shutting down for system reboot

Attachment: dmesg.txt.gz
Description: GNU Zip compressed data

Attachment: config.txt.gz
Description: GNU Zip compressed data