Re: Linux-2.4.0-test10

From: Stephen C. Tweedie (sct@redhat.com)
Date: Thu Nov 02 2000 - 12:17:17 EST


Hi,

On Tue, Oct 31, 2000 at 08:55:13PM +0000, Alan Cox wrote:
>
> Questions:
> Has the O_SYNC stuff been fixed so that more than ext2 honours this
> flag ?

Not yet.

Linus, the last patch I sent you on this didn't make it in --- is it
worth my while resending, or do we need to rethink how to do this?

2.2 O_SYNC is actually broken too --- it doesn't sync all metadata (in
particular, it doesn't update the inode), but I'd rather fix that for
2.4 rather than change 2.2, as the main users of O_SYNC, databases,
are writing to preallocated files anyway.

The patch I sent fully implements O_SYNC (actually, it implements
O_DSYNC, which is allowed to skip the inode sync if the only attribute
which has changed is the timestamps) and fdatasync. It's easy for me
to make the DSYNC selectable via sysctl for full SU compliance, and I
know of other unixes that already do this --- you really don't want
existing database applications suddenly to start seeking to the inode
block for every O_SYNC write.

There are two parts to the implementation here --- the separation of
O_DIRTY into two bits, a "fully-dirty" bit and a "timestamp-dirty"
bit, and the use of a linked list of buffer_heads against each inode
to track all dirty data. It is possible to do without the latter, but
that requires either doing a full mapping tree walk after O_SYNC to
flush indirect blocks, or doing indirect writes synchronously as we
write out the data. f[data]sync can't do the sync-indirect-write
trick, so is still required to walk the whole indirect tree on fsync,
which can get expensive on large files.

Cheers,
 Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Nov 07 2000 - 21:00:11 EST