Re: Oops in NFS (RHEL4, but also in kernel bugzilla)
From: Daniel J Blueman
Date: Wed Jun 18 2008 - 22:33:19 EST
Hi Ian,
On 17 Jun, 17:10, Ian Soboroff <isoboroff@xxxxxxxxx> wrote:
> I have a server that hosts some large XFS filesystems and serves them
> out over NFS. Every so often I get the following Oops, and then the
> machine locks hard with blinky keyboard lights. ("Every so often" == I
> can't reproduce this reliably. It comes up about once a week, we've
> seen it three times.)
>
> Unable to handle kernel NULL pointer dereference at virtual address 00000000
> printing eip:
> 00000000
> *pde = 355bf001
> Oops: 0000 [#1]
> SMP
> Modules linked in: nfs nfsd exportfs lockd nfs_acl md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc button battery ac ohci_hcd tg3 floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod aacraid aic7xxx sd_mod scsi_m
> od
> CPU: 0
> EIP: 0060:[<00000000>] Not tainted VLI
> EFLAGS: 00010282 (2.6.9-67.0.15.ELirsmp)
> EIP is at 0x0
> eax: e1c86c30 ebx: c04ba260 ecx: 00000000 edx: d820304c
> esi: d820304c edi: f6ecbf00 ebp: 00000000 esp: f6ecbee4
> ds: 007b es: 007b ss: 0068
> Process nfsd (pid: 4339, threadinfo=f6ecb000 task=f6c470b0)
> Stack: c0168c5f e1c86c30 ffffffff f5f96090 60229cac cc751afc c0168cd3 60229cac
> 00000008 f5f96088 e1c86ca0 e1c86ca0 e1c86c30 cc751afc f5f95004 f8bcee28
> f5f96088 f7e6ba00 f7d351c0 f7e6ba00 f8b2b46a f5f95800 f5f95000 f5f951d4
> Call Trace:
> [<c0168c5f>] __lookup_hash+0x70/0x89
> [<c0168cd3>] lookup_one_len+0x54/0x63
> [<f8bcee28>] nfsd_lookup+0x321/0x3ad [nfsd]
> [<f8b2b46a>] svcauth_unix_set_client+0xa7/0xb5 [sunrpc]
> [<f8bd6b49>] nfsd3_proc_lookup+0xa9/0xb3 [nfsd]
> [<f8bd8b37>] nfs3svc_decode_diropargs+0x0/0xfa [nfsd]
> [<f8bcc681>] nfsd_dispatch+0xba/0x16d [nfsd]
> [<f8b2862d>] svc_process+0x444/0x6f3 [sunrpc]
> [<f8bcc45a>] nfsd+0x1cc/0x339 [nfsd]
> [<f8bcc28e>] nfsd+0x0/0x339 [nfsd]
> [<c01041f5>] kernel_thread_helper+0x5/0xb
> Code: Bad EIP value.
> <0>Fatal exception: panic in 5 seconds
Has 4KB stacks been disabled? You can check the config file for CONFIG_4KSTACKS.
It may also be worth feeding that into the bugzilla entry, to
eliminate one possibility, as 'bad EIP value' looks suspicious of
stack corrption.
Daniel
> This machine is running RHEL4, using the stock kernel but with XFS
> enabled. I would have reported it to Redhat instead, but in googling
> around found a nearly identical kernel bugzilla report:
>
> http://bugzilla.kernel.org/show_bug.cgi?id=7809
>
> In there, the bug reporter has tracked the Oops to __lookup_hash() in
> fs/namei.c, and includes a patch which basically just takes care to not
> dereference inode->i_op->lookup without checking it first.
>
> I looked at the latest fs/namei.c via gitweb and it's the same code. So
> here I am reporting it here, where more knowledgable and responsive
> people lurk anyway.
>
> Is this a NFS problem, or an XFS one? (Since XFS is common in both my
> report and in the bugzilla one... I'm not sure whether the 'inode' in
> question is NFS or from the underlying filesystem).
>
> Is the bugzilla report's patch papering over a real problem, or does it
> fix a real possible null-pointer case in __lookup_hash?
>
> Thanks,
> Ian
--
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/