Re: BUG: drop_pagecache_sb vs kjournald lockup

From: David Chinner
Date: Wed Mar 19 2008 - 15:52:43 EST


On Wed, Mar 19, 2008 at 11:08:05AM +0100, Jan Kara wrote:
> On Wed 19-03-08 07:46:46, David Chinner wrote:
> > On Tue, Mar 18, 2008 at 02:43:26PM +0100, Jan Kara wrote:
> > > > 2.6.25-rc3, 4p ia64, ext3 root drive.
> > > >
> > > > I was running an XFS stress test on one of the XFS partitions on
> > > > the machine (zero load on the root ext3 drive), when the system
> > > > locked up in kjournald with this on the console:
> > > >
> > > > BUG: spinlock lockup on CPU#2, kjournald/2150, a000000100e022e0
> > > >
> > > <snip traces>
> >
> > <snip other stuff>
> >
> > > > Anyone know the reason why drop_pagecache_sb() uses such a brute-force
> > > > mechanism to free up clean page cache pages?
> > > Yes, we know that drop_pagecache_sb() has locking issues but since it
> > > is intended to be used for debugging purposes only, nobody cared enough
> > > to fix it. Completely untested patch below if you dare to try ;)
> >
> > It may be intended for debuging purposes, but it does get used in
> > production HPC environments (a lot!). I guess I've never seen this
> > lockup before because SGI customers don't use ext3, but they have
> > complained about the system "stopping" while drop_caches is executed.
> > This locking ..... strategy would explain it, though.
> ;) But in case your customers use it in production, shouldn't you push
> for a better interface for such feature? Just a thought...

Because it's much less horrible than the thing it replaced, does
what is needed and mostly does not cause problems? And because it
usually just works I've never felt the need to look at the
implementation....

/me is off to look at Fengguang's patch

> > I'll try the patch, but I can't guarantee anything - I only saw this
> > lockup once in about 18 hours when dropping caches every 2 seconds.
> Well, if you enable lockdep, it should warn you about possible problems
> much earlier (at least we already got several reports of lockdep warnings
> when using drop_caches).

ia64 - no lockdep :/

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/