Re: 2.6.12-rc2 + rc3: reaim with ext3 - system stalls.

From: cliff white
Date: Tue May 03 2005 - 10:34:39 EST


On 03 May 2005 08:01:03 -0700
Badari Pulavarty <pbadari@xxxxxxxxxx> wrote:

> On Tue, 2005-05-03 at 07:43, Jan Kara wrote:
> > Hello,
> >
> > > Started seeing some odd behaviour with recent kernels, haven't been able to
> > > run it down, could use some suggestions/help.
> > >
> > > Running re-aim7 with 2.6.12-rc2 and rc3, if I use xfs, jfs, or
> > > reiserfs things work just fine.
> > >
> > > With ext3, the test stalls, such that:
> > > CPU is 50% idle, 50% waiting IO (top)
> > > vmstat shows one process blocked wio
> > I've looked through your dumps and I spotted where is the problem -
> > it's our well known and beloved lock inversion between PageLock and
> > transaction start (giving CC to Badari who's the author of the patch
> > that introduced it AFAIK).
>
> Yuck. It definitely not intentional.
>
> > The correct order is: first get PageLock and *then* start transaction.
> > But in ext3_writeback_writepages() first ext3_journal_start() is called
> > and then __mpage_writepages is called that tries to do LockPage and
> > deadlock is there. Badari, could you please fix that (sadly I think that
> > would not be easy)? Maybe we should back out those changes until it gets
> > fixed...
>
> Hmm.. let me take a closer look. You are right, its not going to be
> simple fix.
>
> Cliff, here is the patch to backout writepages() for ext3. Can you
> verify that problems goes away with this patch ?

Sure, it's semi-random behavior, so it'll take a few runs to be sure.
cliffw

>
> Thanks,
> Badari


--
"Ive always gone through periods where I bolt upright at four in the morning;
now at least theres a reason." -Michael Feldman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/