On Tue, May 04, 2010 at 11:45:53AM -0400, Christoph Hellwig wrote:On Tue, May 04, 2010 at 10:16:37AM -0400, Ric Wheeler wrote:Checking per inode is actually incorrect - we do not want to short cut
the need to flush the target storage device's write cache just because a
specific file has no dirty pages. If a power hit occurs, having sent
the pages from to the storage device is not sufficient.
As long as we're only using the information for fsync doing it per inode
is the correct thing. We only want to flush the cache if the inode
(data or metadata) is dirty in some way. Note that this includes writes
via O_DIRECT which are quite different to track - I've not found the
original patch in my mbox so I can't comment if this is done right.
I agree.
I wonder if it's worthwhile to think about a new system call which
allows users to provide an array of fd's which are collectively should
be fsync'ed out at the same time. Otherwise, we end up issuing
multiple barrier operations in cases where the application needs to
do:
fsync(control_fd);
fsync(data_fd);
- Ted