Re: Kernel crash with 2.6.29 + nfs + xfs (radix-tree)

From: Alex Samad
Date: Wed May 20 2009 - 05:57:09 EST


On Wed, May 20, 2009 at 07:05:58PM +1000, Dave Chinner wrote:
> On Wed, May 20, 2009 at 10:37:45AM +1000, Alex Samad wrote:
> > Hi
> >
> > I have been quit a lot of crashes on my debian amd64 box in the 2.6.29
> > series of kernel. Seems for me to be when the system is under load and
> > there is network action -> nfsd -> xfs.
>
> Perhaps a use after free or a reference counting problem. Thanks for
> reporting it.
>
> > May 5 19:45:38 x kernel: ------------[ cut here ]------------

[snip]

> > I have logged a bug with debian
> > ( more info http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=526406),
> > there has been one other to report this problem.
> >
> > we believe somebody has already reported a similar problem here
> > http://groups.google.com/group/linux.kernel/browse_thread/thread/dd00f52e93397c9e/6b6814dab9b41a05?pli=1
>
> Which no-one noticed was related to XFS (not in the subject line)
> and so most people (like me) would have simply deleted it without
> reading it....
>
> > has any one else seen this problem, who do I need to raise this too ?
>

thanks

> I've cc'd the XFS list.
>
> > I am able to reproduce this problem on my machine (amd64 phenomem II 8G
> > ram), running virtualbox, I have a vm access the local filesystem via
> > nfs (udp) and when I do a rm -fr <some directory ~200M> I see the bug
>
> I run debian, XFS and 2.6.29 on all my machines but I haven't
> tripped over the problem - it all appears to be related to calling
> dispose_list() during/just after removing a lot of files. If you
> have a simple method of reproducing the problem (e.g. a simple shell
> script) it would help track down the problem much faster....

my source directory was an openwrt trunk (svn co
svn://svn.openwrt.org/openwrt/trunk/) which I had done a compile on, I
went to delete it (just about every time it would cause this problem.

on the original data set (I was in the process of moving from one
location to another so I still have the original data)

du -s --si
5.2G

find | wc -l
313320

if you have a look at the debian bug, another person (mike) has
experienced this on a machine that is basically a backup server so
heavily stressed out - using xfs partitions - he found going back to
2.6.28-7 seems to be stable.



>
> Cheers,
>
> Dave.

Attachment: signature.asc
Description: Digital signature