Re: batched write

From: Andreas Dilger
Date: Mon Jun 19 2006 - 12:26:22 EST


On Jun 17, 2006 10:04 -0700, Andrew Morton wrote:
> On Thu, 15 Jun 2006 02:08:32 +0400
> "Vladimir V. Saveliev" <vs@xxxxxxxxxxx> wrote:
>
> > The core of generic_file_buffered_write is
> > do {
> > grab_cache_page();
> > a_ops->prepare_write();
> > copy_from_user();
> > a_ops->commit_write();
> >
> > filemap_set_next_iovec();
> > balance_dirty_pages_ratelimited();
> > } while (count);
> >
> >
> > Would it make sence to rework this code with adding new address_space
> > operation - fill_pages so that looks like:
> >
> > do {
> > a_ops->fill_pages();
> > filemap_set_next_iovec();
> > balance_dirty_pages_ratelimited();
> > } while (count);
> >
> > generic implementation of fill_pages would look like:
> >
> > generic_fill_pages()
> > {
> > grab_cache_page();
> > a_ops->prepare_write();
> > copy_from_user();
> > a_ops->commit_write();
> > }
> >
>
> There's nothing which leaps out and says "wrong" in this. But there's
> nothing which leaps out and says "right", either. It seems somewhat
> arbitrary, that's all.
>
> We have one filesystem which wants such a refactoring (although I don't
> think you've adequately spelled out _why_ reiser4 wants this).
>
> To be able to say "yes, we want this" I think we'd need to understand which
> other filesystems would benefit from exploiting it, and with what results?

With the caveat that I didn't see the original patch, if this can be a step
down the road toward supporting delayed allocation at the VFS level then
I'm all for such changes.

Lustre goes to some lengths to batch up reads and writes on the client into
large (1MB+) RPCs in order to maximize performance. Similarly on the
server we essentially bypass the VFS in order to allocate all of the RPC's
blocks in one call and do a large bio write in a second. It just isn't
possible to maximize performance if everything is split into PAGE_SIZE
chunks.

I believe XFS would benefit from delayed allocation, and the ext3-delalloc
patches from Alex also provide a large part of the performance wins for
userspace IO, when they allow large sys_write() and VM cache flush to
efficiently call into the filesystem to allocate many blocks at once, and
then push them out to disk in large chunks.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/