Re: [PATCH 02/11] writeback: switch to per-bdi threads forflushing data

From: Christoph Hellwig
Date: Wed May 20 2009 - 08:24:21 EST


On Wed, May 20, 2009 at 02:16:30PM +0200, Jens Axboe wrote:
> It's a fine rule, I agree ;-)
>
> I'll take another look at this when splitting the sync paths.

Btw, there has been quite a bit of work on the higher level sync code in
the VFS tree, and I have some TODO list items for the lower level sync
code. The most important one would be splitting data and metadata
writeback.

Currently __sync_single_inode first calls do_writepages to write back
the data, then write_inode to potentially write the metadata and then
finally filemap_fdatawait to wait for the inode to be completed.

Now for one thing doing the data wait after the metadata writeout is
wrong for all those filesystems performing some kind of metadata updates
in the I/O completion handler, and e.g. XFS has to work around this
by doing a wait by itself in it's write_inode handler.

Second inodes are usually clustered together, so if a filesystem can
issue multiple dirty inodes at the same time performance will be much
better.

So an optimal sync could would first issue data I/O for all inodes it
wants to write back, then wait for the data I/O to finish and finally
write out the inodes in big clusters.

I'm not quite sure when we'll get to that, just making sure we don't
work against this direction anywhere.

And yeah, I really need to take a detailed look at the current
incarnation of your patchset :)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/