Re: [PATCH 3/4] writeback: pay attention to wbc->nr_to_write inwrite_cache_pages

From: Andrew Morton
Date: Fri Apr 30 2010 - 15:44:17 EST


On Fri, 30 Apr 2010 11:31:53 +0530
"Aneesh Kumar K. V" <aneesh.kumar@xxxxxxxxxxxxxxxxxx> wrote:

> On Thu, 29 Apr 2010 14:39:31 -0700, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> > On Tue, 20 Apr 2010 12:41:53 +1000
> > Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> >
> > > If a filesystem writes more than one page in ->writepage, write_cache_pages
> > > fails to notice this and continues to attempt writeback when wbc->nr_to_write
> > > has gone negative - this trace was captured from XFS:
> > >
> > >
> > > wbc_writeback_start: towrt=1024
> > > wbc_writepage: towrt=1024
> > > wbc_writepage: towrt=0
> > > wbc_writepage: towrt=-1
> > > wbc_writepage: towrt=-5
> > > wbc_writepage: towrt=-21
> > > wbc_writepage: towrt=-85
> > >
> >
> > Bug.
> >
> > AFAIT it's a regression introduced by
> >
> > : commit 17bc6c30cf6bfffd816bdc53682dd46fc34a2cf4
> > : Author: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxxxxxxx>
> > : AuthorDate: Thu Oct 16 10:09:17 2008 -0400
> > : Commit: Theodore Ts'o <tytso@xxxxxxx>
> > : CommitDate: Thu Oct 16 10:09:17 2008 -0400
> > :
> > : vfs: Add no_nrwrite_index_update writeback control flag
> >
> > I suggest that what you do here is remove the local `nr_to_write' from
> > write_cache_pages() and go back to directly using wbc->nr_to_write
> > within the loop.
> >
> > And thus we restore the convention that if the fs writes back more than
> > a single page, it subtracts (nr_written - 1) from wbc->nr_to_write.
> >
>
> My mistake i never expected writepage to write more than one page.

The writeback code is tricky and easy to break in subtle ways.

> The
> interface said 'writepage' so it was natural to expect that it writes only
> one page. BTW the reason for the change is to give file system which
> accumulate dirty pages using write_cache_pages and attempt to write
> them out later a chance to properly manage nr_to_write. Something like
>
> ext4_da_writepages
> -- write_cache_pages
> ---- collect dirty page
> ---- return
> --return
> --now try to writeout all the collected dirty pages ( say 100)
> ----Only able to allocate blocks for 50 pages
> so update nr_to_write -= 50 and mark rest of 50 pages as dirty
> again
>
> So we want wbc->nr_to_write updated only by ext4_da_writepages.

So you want a ->writepage() implementation which doesn't actually write
a page at all - it just remembers that page for later.

Maybe that fs shouldn't be calling write_cache_pages() at all. After
all, write_cache_pages() is a wrapper which emits a sequence of calls
to ->writepage(), and ->writepage() writes a page.

Rather than hacking around, subverting things and breaking core kernel
code, let's step back and more clearly think about what to do?

One option would be to implement a new address_space_operation which
provides the new semantics in a well-understood fashion. Let's call it
writepage_prepare(?). Then reimplement write_cache_pages() so that if
->writepage_prepare() is available, it handles it in a sensible fashion
and doesn't break traditional filesystems.

Or simply implement a new, different version of write_cache_pages() for
filesystems which wish to buffer in this fashion. The new
write_cache_pages_prepare()(?) would call ->writepage_prepare().
Internally it might share implementation with write_cache_pages().

There are lots of options. But the way in which write_cache_pages()
was extended to handle this ext4 requirement was rather unclean,
non-obvious and, umm, broken!


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/