I'm thinking about how a _network_ file system might use a bdflush like
functionality to flush dirty file and directory data buffers back to the
server. The consistency guarantees in this hypothetical file system don't
require flushing to servers on closes or on writes (as in Coda, NFS v3/v2
etc). It's fine, just as with ext2, to just let the buffers sit for a while
and write them back. This sort of idea can be used in file systems for
tightly connected clusters.
I'd like such functionality, but don't know if it is there already, and I
missed it, or if we still need to build it in. I looked at 2.3.8.
The problem is that I don't know how to do flush dirty inode pages with the
current bdflush facility. The client doesn't really see a block device,
and is aware of tuples (server, device, ino, logical block) and not (server,
device, physical block). Such buffers don't fit too well in the buffer
cache.
I see a couple of possibities and I'd love to be educated:
A) introduce "sync_inode_pages" into the sync_dev function.
There is already sync_inodes which is responsible for writing updated inode
data into the buffers. sync_inode_pages would be responsible for writing
the data pages out to the server.
The process of flushing pages would be substantially similar to what nfs
does upon close: nfs_file_flush. But unlike nfs, it would be called
periodically and likely only depend on the inode, not on a file structure.
Perhaps "sync_inode_pages" could just walk the i_pages list and call the
flushpage method on each.
Perhaps we would a flag to indicate waiting for pages to really make it all
the way, or to merely initiate the flushing.
[If the file system would also have metadata pages on the client, then a
similar function "sync_super_metadata" could be made responsible to write
back such pages for each super block.]
B) change the buffer heads
The following looks more like a hack to me, but perhaps there is nothing
wrong with it.
With a pointer in the buffer head to the page which contains this buffer, I
could reach the inode and logical block in the file.
I'd like to give block device structures a function to hash buffers using
other data than just (device, blockno), which has no meaning for my fs. In
the case at hand the hash function would use the (device, inode, logical
block) triple.
Combined with the new 2.3 type fs_getblk_t (shouldn't this be an inode or
super operation, instead of an explicitly passed callback function?) this
would still allow buffers to be found easily from the fs.
Now one could write a "block device" request function for device that can
support this file system which can find out enough about the buffers to get
them back to the server.
A) may look simpler than B) but it requires the file system to implement the
machinery of waiting for the pages to be written back. The block device
infrastructure already has this.
Is this a reasonable thing to ask for?
- Peter -
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/