Re: Panic with XFS on RHEL5 (2.6.18-8.1.8.el5)

From: Chris Boot
Date: Sat Aug 18 2007 - 12:28:44 EST

Next message: Jan Engelhardt: "Re: Panic with XFS on RHEL5 (2.6.18-8.1.8.el5)"
Previous message: Jan Engelhardt: "Re: power off disk drives while running"
In reply to: Måns Rullgård: "Re: Panic with XFS on RHEL5 (2.6.18-8.1.8.el5)"
Next in thread: Jan Engelhardt: "Re: Panic with XFS on RHEL5 (2.6.18-8.1.8.el5)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Måns Rullgård wrote:

Chris Boot <bootc@xxxxxxxxx> writes:

Måns Rullgård wrote:

Chris Boot <chris.boot@xxxxxxxxxxxxxx> writes:

All,

I've got a box running RHEL5 and haven't been impressed by ext3
performance on it (running of a 1.5TB HP MSA20 using the cciss
driver). I compiled XFS as a module and tried it out since I'm used to
using it on Debian, which runs much more efficiently. However, every
so often the kernel panics as below. Apologies for the tainted kernel,
but we run VMware Server on the box as well.

Does anyone have any hits/tips for using XFS on Red Hat? What's
causing the panic below, and is there a way around this?

BUG: unable to handle kernel paging request at virtual address b8af9d60
printing eip:
c0415974
*pde = 00000000
Oops: 0000 [#1]
SMP last sysfs file: /block/loop7/dev

[...]

[<f936884e>] xfsbufd_wakeup+0x28/0x49 [xfs]
[<c04572f9>] shrink_slab+0x56/0x13c
[<c0457c0c>] try_to_free_pages+0x162/0x23e
[<c0454064>] __alloc_pages+0x18d/0x27e
[<c045214e>] find_or_create_page+0x53/0x8c
[<c046c7b1>] __getblk+0x162/0x270
[<c0475be0>] do_lookup+0x53/0x157
[<f889138f>] ext3_getblk+0x7c/0x233 [ext3]
[<f88913fe>] ext3_getblk+0xeb/0x233 [ext3]
[<c048215c>] mntput_no_expire+0x11/0x6a
[<f889226e>] ext3_bread+0x13/0x69 [ext3]
[<f8895606>] htree_dirblock_to_tree+0x22/0x113 [ext3]
[<f889574f>] ext3_htree_fill_tree+0x58/0x1a0 [ext3]
[<c047828b>] do_path_lookup+0x20e/0x25f
[<c046b987>] get_empty_filp+0x99/0x15e
[<f889d611>] ext3_permission+0x0/0xa [ext3]
[<f888eaa3>] ext3_readdir+0x1ce/0x59b [ext3]
[<c047a0dd>] filldir+0x0/0xb9
[<c0472973>] sys_fstat64+0x1e/0x23
[<c047a1f9>] vfs_readdir+0x63/0x8d
[<c047a0dd>] filldir+0x0/0xb9
[<c047a447>] sys_getdents+0x5f/0x9c
[<c0403eff>] syscall_call+0x7/0xb
=======================

Your Redhat kernel is probably built with 4k stacks and XFS+loop+ext3
seems to be enough to overflow it.

Thanks, that explains a lot. However, I don't have any XFS filesystems
mounted over loop devices on ext3. Earlier in the day I had iso9660 on
loop on xfs, could that have caused the issue? It was unmounted and
deleted when this panic occurred.

The mention of /block/loop7/dev and the presence both XFS and ext3
function in the call stack suggested to me that you might have an ext3
filesystem in a loop device on XFS. I see no other explanation for
that call stack other than a stack overflow, but then we're still back
at the same root cause.

Are you using device-mapper and/or md? They too are known to blow 4k
stacks when used with XFS.

I am. The situation was earlier on was iso9660 on loop on xfs on lvm on cciss. I guess that might have smashed the stack undetectably and induced corruption encountered later on? When I experienced this panic the machine would have probably been performing a backup, which was simply a load of ext3/xfs filesystems on lvm on the HP cciss controller. None of the loop devices would have been mounted.

I have a few machines now with 4k stacks and using lvm + md + xfs and have no trouble at all, but none are Red Hat (all Debian) and none use cciss either. Maybe it's a deadly combination.

I'll probably just try and recompile the kernel with 8k stacks and see
how it goes. Screw the support, we're unlikely to get it anyway. :-P

Please report how this works out.

I will. This will probably be on Monday now, since the machine isn't accepting SysRq requests over the serial console. :-(

Many thanks,
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Jan Engelhardt: "Re: Panic with XFS on RHEL5 (2.6.18-8.1.8.el5)"
Previous message: Jan Engelhardt: "Re: power off disk drives while running"
In reply to: Måns Rullgård: "Re: Panic with XFS on RHEL5 (2.6.18-8.1.8.el5)"
Next in thread: Jan Engelhardt: "Re: Panic with XFS on RHEL5 (2.6.18-8.1.8.el5)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]