Hi Christoph,
On Fri, Mar 09, 2007 at 10:39:13AM +0000, Christoph Hellwig wrote:Hi Nick,
sorry for my later reply, this has been on my to answer list for the last
month and I only managed to get back to it now.
No worries, I haven't had much time to work on it since then anyway.
Thanks for taking a look.
On Thu, Feb 08, 2007 at 02:07:36PM +0100, Nick Piggin wrote:as a single call to copy a given amount of userdata at the given offset. This
is more flexible, because the implementation can determine how to best handle
errors, or multi-page ranges (eg. it may use a gang lookup), and only requires
one call into the fs.
I really like this idea, especially for avoiding to call into the allocator
for every block.
Have you contacted the reiser4 folks whether this would
superceed their batch_write op completely?
I haven't yet, although that's been on my todo list when I get the API
into a more final state.
batch_write seems quite similar, however theirs is still page based, and
a bit crufty, IMO. I found it to be really clean to just pass down offsets,
but that may be a matter for debate.
What they _do_ have is a write actor function that will do the data copy.
This could be one possible way to get rid of ->prepare_write and
->commit_write, but I haven't tried that yet, because I don't like adding
more redirection and complexity if possible...
One problem with this interface is that it cannot be used to write into the
filesystem by any means other than already-initialised buffers via iovecs. So
prepare/commit have to stay around for non-user data...
Actually I think that's a a good thing to a certain extent. It reminds
us that all other users are horrible abuse of the interface. I'd even
go so far as to make batch_write a callback that the filesystem passes
to generic_file_aio_write to make clear it's not a generic thing but
a helper. (It's not a generic thing because it's the upper layer writing
into the pagecache, not a pagecache to fs below operation).
OK, if you think that's reasonable, then that is one hurdle out of the way ;)
The still leaves open on how to get rid of ->prepare_write and - >commit_write
compltely, and for that we'll probably need ->kernel_read and - >kernel_write
file operations. But that's a step you shouldn't consider yet when doing
this work.
I had a couple of possibilities for that. First is passing in a write actor
(eg. defaulting to the normal iovec usercopy), but as I said I consider this
more like fixing the problem with brute force (ie. just making the interface
more complex). Maybe as a last resort, though.
Another thing that would be much nicer from _my_ point of view would be to
just make all kernel users set up their data in an iovec, and use the normal
call with KERNEL_DS. Unfortunately, this is not the expected way for a lot
of code to work, and it might require extra copying of the data.
Another thing is that it seems to be less able to be implemented in generic,
reusable code. It should be possible to introduce a new 2-op interface (or
maybe just a new error handler op) which can be used correctly in generic code.
We should be able to find a nice abstraction for this, see my next mails.
+ /*
+ * perform_write replaces prepare and commit_write callbacks.
+ */
This is a rather useless comment :) Better remove it and add a proper
descriptions to Documentation/filesystems/vfs.txt and
Documentation/filesystems/Locking
Will do. Thanks!