Re: Crashes and lockups in XFS filesystem (2.6.8-rc4 and 2.6.8.1-mm2).

From: Nathan Scott
Date: Fri Aug 20 2004 - 02:54:30 EST

Next message: David Mosberger: "Re: kernbench on 512p"
Previous message: Nick Piggin: "Re: Possible dcache BUG"
In reply to: David Martinez Moreno: "Random crashes (was Re: Crashes and lockups in XFS filesystem (2.6.8-rc4).)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Aug 19, 2004 at 09:27:58PM +0200, David Martinez Moreno wrote:
> El Jueves, 19 de Agosto de 2004 10:44, Nathan Scott escribió:
> > Did /mnt run out of space while doing that? Or nearly? There's
> > a known issue with that area of the XFS code, in conjunction with
> > 4K stacks at the moment - was that enabled in your .config?
> >
> > Looks like something stamped on parts of the xfs_mount structure
> > for the filesystem mounted at /mnt, a stack overrun would explain
> > that and your subsequent oopsen.
>
> Bad news!
>
> Aaarggh! It seemed that 2.6.8.1-mm2 was going to stand, and it did during an
> entinre hour or so, but it failed as well. The XFS filesystem on the RAID 0
> was being acceded by 100 users and I was copying files from it to the SCSI
> disk sda1, under XFS as well.

Hmmm, that looks awfully similar to the problem other folks are
chasing (see thread "Possible dcache BUG"). Is it reproducible
for you by any chance?

There have been some debugging patches that Marcelo posted in that
thread, would be a good idea to try those out; although with your
crash its difficult to see which slab it is thats corrupted (iirc,
Marcelo's patch was targetting the buffer_head slab).

> The crash:
>
> ------------[ cut here ]------------
> kernel BUG at include/linux/list.h:164!
> invalid operand: 0000 [#1]
> CPU: 0
> EIP: 0060:[<c01332ef>] Not tainted VLI
> EFLAGS: 00010006 (2.6.8.1-mm2)
> EIP is at shrink_cache+0x310/0x31d
> eax: ffffffff ebx: c0426264 ecx: c1081658 edx: c10444f8
> esi: 0000000f edi: c0426288 ebp: 0000000e esp: c15afeb0
> ds: 007b es: 007b ss: 0068
> Process kswapd0 (pid: 10, threadinfo=c15ae000 task=c15620b0)
> Stack: c15afeb0 c15afeb0 c15ae000 00000020 c10444d8 c10d3d98 00000000 00000001
> c13bcd20 c1251860 c13bb6a0 c13bb840 c13bb680 c129cce0 c129d980 c13fbc20
> c13b8b20 c13c2940 c13c2960 c135f300 c13ba160 c13ba1c0 c13c9120 c13bcd00
> Call Trace:
> [<c01328c8>] shrink_slab+0x85/0x187
> [<c0133817>] shrink_zone+0x9e/0xb8
> [<c0133bd4>] balance_pgdat+0x1c9/0x22d
> [<c0133cff>] kswapd+0xc7/0xd7
> [<c0113a2d>] autoremove_wake_function+0x0/0x57
> [<c0113a2d>] autoremove_wake_function+0x0/0x57
> [<c0133c38>] kswapd+0x0/0xd7
> [<c0101fdd>] kernel_thread_helper+0x5/0xb
> Code: eb e4 8b 44 24 10 83 c5 01 89 50 04 89 02 8d 44 24 10 89 42 04 89 54 24
> 10 e9 cd fd ff ff 0f 0b a5 00 d0 0d 3d c0 e9

--
Nathan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: David Mosberger: "Re: kernbench on 512p"
Previous message: Nick Piggin: "Re: Possible dcache BUG"
In reply to: David Martinez Moreno: "Random crashes (was Re: Crashes and lockups in XFS filesystem (2.6.8-rc4).)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]