Re: 2.6.31.4: Oops

From: Trond Myklebust
Date: Mon Oct 26 2009 - 15:50:04 EST


On Mon, 2009-10-19 at 11:21 +0200, Stephan von Krawczynski wrote:
> On Mon, 19 Oct 2009 13:50:23 +0900
> Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> wrote:
>
> > On Sun, 2009-10-18 at 20:49 -0700, Andrew Morton wrote:
> > > (cc linux-nfs)
> > >
> > > On Wed, 14 Oct 2009 11:53:06 +0200 Stephan von Krawczynski <skraw@xxxxxxxxxx> wrote:
> > >
> > > > Hello all,
> > > >
> > > > just received this one:
> > > >
> > > > Oct 13 20:16:02 box kernel: BUG: unable to handle kernel paging request at ffffff98
> > > > Oct 13 20:16:02 box kernel: IP: [<f827b2e4>] nfs_writepages+0x13/0xad [nfs]
> > > > Oct 13 20:16:02 box kernel: *pde = 0042d067 *pte = 00000000
> > > > Oct 13 20:16:02 box kernel: Oops: 0002 [#1]
> > > > Oct 13 20:16:02 box kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:03:08.0/subsystem_device
> > > > Oct 13 20:16:02 box kernel: Modules linked in: speedstep_lib freq_table nfs lockd sunrpc e100 mii e1000
> > > > Oct 13 20:16:02 box kernel:
> > > > Oct 13 20:16:02 box kernel: Pid: 4638, comm: httpd2-prefork Not tainted (2.6.31.4 #1)
> > > > Oct 13 20:16:02 box kernel: EIP: 0060:[<f827b2e4>] EFLAGS: 00010292 CPU: 0
> > > > Oct 13 20:16:02 box kernel: EIP is at nfs_writepages+0x13/0xad [nfs]
> > > > Oct 13 20:16:02 box kernel: EAX: f0d0f654 EBX: 0000000a ECX: 00000020 EDX: f6393ecc
> > > > Oct 13 20:16:02 box kernel: ESI: f0d0f654 EDI: 00000000 EBP: ffffff98 ESP: f6393e38
> > > > Oct 13 20:16:02 box kernel: DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
> > > > Oct 13 20:16:02 box kernel: Process httpd2-prefork (pid: 4638, ti=f6392000 task=f63f7850 task.ti=f6392000)
> > > > Oct 13 20:16:03 box kernel: Stack:
> > > > Oct 13 20:16:03 box kernel: f6393ecc f0d0f654 00000000 c0161f93 002283a0 00000000 00000000 f6088052
> > > > Oct 13 20:16:03 box kernel: <0> f4d0f7ec f6393e6c f715ca00 f827362e f700d900 f4d08a14 0000000a f0d0f654
> > > > Oct 13 20:16:03 box kernel: <0> f6393ecc 00000020 f827c7ce 0000000a f6393ec4 f6393ef4 f0d0f654 f827c85e
> > > > Oct 13 20:16:03 box kernel: Call Trace:
> > > > Oct 13 20:16:03 box kernel: [<c0161f93>] ? __link_path_walk+0x840/0x910
> > > > Oct 13 20:16:03 box kernel: [<f827362e>] ? __nfs_revalidate_inode+0x105/0x18a [nfs]
> > > > Oct 13 20:16:03 box kernel: [<f827c7ce>] ? __nfs_write_mapping+0xf/0x3b [nfs]
> > > > Oct 13 20:16:03 box kernel: [<f827c85e>] ? nfs_write_mapping+0x64/0x6c [nfs]
> > > > Oct 13 20:16:03 box kernel: [<c01e0341>] ? __copy_to_user_ll+0x3e/0x45
> > > > Oct 13 20:16:03 box kernel: [<f8273238>] ? nfs_getattr+0x34/0xaf [nfs]
> > > > Oct 13 20:16:03 box kernel: [<f8273204>] ? nfs_getattr+0x0/0xaf [nfs]
> > > > Oct 13 20:16:03 box kernel: [<c015dce1>] ? vfs_getattr+0x21/0x30
> > > > Oct 13 20:16:03 box kernel: [<c015dd6e>] ? vfs_fstatat+0x4d/0x61
> > > > Oct 13 20:16:03 box kernel: [<c015dda7>] ? vfs_lstat+0x13/0x15
> > > > Oct 13 20:16:03 box kernel: [<c015e2fc>] ? sys_lstat64+0xf/0x23
> > > > Oct 13 20:16:03 box kernel: [<c0102848>] ? sysenter_do_call+0x12/0x26
> > > > Oct 13 20:16:03 box kernel: Code: c3 56 89 c6 53 e8 4a ff ff ff 89 c3 89 f0 e8 5b 0e ec c7 89 d8 5b 5e c3 55 57 56 53 83 ec 38 89 44 24 04 89 14 24 8b 38 8d 6f 98 <0f> ba 6f 98 04 19 c0 31 d2 85 c0 74 19 68 82 00 00 00 ba 04 00
> > > > Oct 13 20:16:03 box kernel: EIP: [<f827b2e4>] nfs_writepages+0x13/0xad [nfs] SS:ESP 0068:f6393e38
> > > > Oct 13 20:16:03 box kernel: CR2: 00000000ffffff98
> > > > Oct 13 20:16:03 box kernel: ---[ end trace 8d9ba71dd690c760 ]---
> > > >
> >
> > From the Oops, it looks as if mapping->host is a null pointer. I don't
> > see how this can ever happen short of a memory scribble...
> >
> > Stephan, have you tried turning on the slab debugging code?
> >
> > Cheers
> > Trond
>
> I have not up to now, but will do so. If I see further output I will come back.
> You think it may be a dead RAM?

Are you by any chance running an NFSv4 client? If so, there is a known
use-after-free bug in 2.6.31 (see
http://bugzilla.kernel.org/show_bug.cgi?id=14249) that would need to be
fixed before you do any more testing.

Alternatively, if you can reproduce this using NFSv3 only (i.e. reboot
after changing _all_ your NFSv4 mounts in /etc/fstab into nfsv3 mounts)
then it must be a different bug.

Cheers
Trond

--
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@xxxxxxxxxx
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/