Re: batched write

From: Andreas Dilger
Date: Mon Jun 19 2006 - 14:51:02 EST


On Jun 19, 2006 09:51 -0700, Hans Reiser wrote:
> Andreas Dilger wrote:
> >With the caveat that I didn't see the original patch, if this can be a step
> >down the road toward supporting delayed allocation at the VFS level then
> >I'm all for such changes.
>
> What do you mean by supporting delayed allocation at the VFS level? Do
> you mean calling to the FS or maybe just not stepping on the FS's toes
> so much or? Delayed allocation is very fs specific in so far as I can
> imagine it.

Currently the VM/VFS call into the filesystem in ->prepare_write for each
page to do block allocation for the filesystem. This is the filesystem's
chance to return -ENOSPC, etc, because after that point the dirty pages
are written asynchronously and there is no guarantee that the application
will even be around when they are finally written to disk.

If the VFS supported delayed allocation it would call into the filesystem
on a per-sys_write basis to allow the filesystem to RESERVE space for all
of the pages in the write call, and then later (under memory pressure,
page aging, or even "pull" from the fs) submit a whole batch of contiguous
pages to the fs efficiently (via ->fill_pages() or whatever).

The fs can know at that time the final file size (if the file isn't still
being dirtied), can allocate all these blocks in a contiguous chunk, can
submit all of the IO in a single bio to the block layer or RPC/RDMA to net.

As you well know, while it is possible to do this now by copying all of the
generic_file_write() logic into the filesystem *_file_write() method, in
practise it is hard to do this from a code maintenance point of view.


Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/