Re: [Bug #12604] Commit 31a12666d8f0c22235297e1c1575f82061480029slows down Berkeley DB

From: Jan Kara
Date: Thu Feb 12 2009 - 09:33:08 EST


On Thu 12-02-09 04:34:23, Nick Piggin wrote:
> On Wed, Feb 11, 2009 at 06:02:19PM -0800, Linus Torvalds wrote:
> >
> >
> > On Thu, 12 Feb 2009, Nick Piggin wrote:
> >
> > > On Tue, Feb 10, 2009 at 05:28:30PM +0100, Jan Kara wrote:
> > > > On Sun 08-02-09 20:21:42, Rafael J. Wysocki wrote:
> > > > > This message has been generated automatically as a part of a report
> > > > > of recent regressions.
> > > > >
> > > > > The following bug entry is on the current list of known regressions
> > > > > from 2.6.28. Please verify if it still should be listed and let me know
> > > > > (either way).
> > > > Yes, I've verified with latest git and the regression is still there.
> > >
> > > I'm working on this FWIW...
> >
> > Shouldn't we just revert it? The code does look to be broken.
> >
> > It also looks like the interaction with that ever-buggy "nr_to_write"
> > thing are totally wrong. I can see that whole
> >
> > if (!cycled) {
> > ..
> > index = 0;
> > goto retry
> > }
> >
> > doing all the wrong things: if we ever hit the combination of
> > "!cycled + nr_to-write==0", we're always screwed. It simply _cannot_ do
> > the right thing.
>
> Doh, I think you spotted the bug. I was going off on the wrong track.
>
>
> > I dunno. That whole piece-of-sh*t function has been incredibly buggy this
> > release. The code is an unreadable mess, and I think that "cyclic" stuff
> > is part of the reason for it being messy and buggy. Please convince me
> > it's worth saving, or let me revert the whole stinking pile of crud?
>
> Well... the cyclic stuff apparently gets used by some filesystems to do
> data integrity stuff, so I think the problem addressed by my broken
> commit maybe shouldn't be ignored.
>
> Maybe you're being harsh on this function. It has been two thinko bugs
> so far. Before this release cycle it seemed to have the record for the
> most data integrity bugs in a single function...
>
> How's this? Jan, can you test with the bdb workload please?
Yes, this patch returns write performance of BDB workload back to
original numbers. Thanks for the fix.

Honza

> --
>
> A bug was introduced into write_cache_pages cyclic writeout by commit
> 31a12666d8f0c22235297e1c1575f82061480029. The intention (and comments)
> is that we should cycle back and look for more dirty pages at the
> beginning of the file if there is no more work to be done.
>
> But the !done condition was dropped from the test. This means that any
> time the page writeout loop breaks (eg. due to nr_to_write == 0), we
> will set index to 0, then goto again. This will set done_index to index,
> then find done is set, so will proceed to the end of the function. When
> updating mapping->writeback_index for cyclic writeout, we now use
> done_index == 0, so we're always cycling back to 0.
>
> This seemed to be causing random mmap writes (slapadd and iozone) to
> start writing more pages from the LRU and writeout would slowdown.
>
> With this patch, iozone random write performance is increased nearly
> 5x on my system (iozone -B -r 4k -s 64k -s 512m -s 1200m on ext2).
>
> Signed-off-by: Nick Piggin <npiggin@xxxxxxx>
> ---
> Index: linux-2.6/mm/page-writeback.c
> ===================================================================
> --- linux-2.6.orig/mm/page-writeback.c 2009-02-12 13:30:42.000000000 +1100
> +++ linux-2.6/mm/page-writeback.c 2009-02-12 14:16:28.000000000 +1100
> @@ -1079,7 +1079,7 @@ continue_unlock:
> pagevec_release(&pvec);
> cond_resched();
> }
> - if (!cycled) {
> + if (!cycled && !done) {
> /*
> * range_cyclic:
> * We hit the last page and there is more work to be done: wrap
--
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/