Re: Proposal for "proper" durable fsync() and fdatasync()

From: Jeff Garzik
Date: Tue Feb 26 2008 - 12:55:21 EST


Jamie Lokier wrote:
Jeff Garzik wrote:
Nick Piggin wrote:
Anyway, the idea of making fsync/fdatasync etc. safe by default is
a good idea IMO, and is a bad bug that we don't do that :(
Agreed... it's also disappointing that [unless I'm mistaken] you have to hack each filesystem to support barriers.

It seems far easier to make sync_blkdev() Do The Right Thing, and magically make all filesystems data-safe.

Well, you need ordered metadata writes, barriers _and_ flushes with
some filesystems.

Merely writing all the data pages than issuing a drive cache flush
won't Do The Right Thing with those filesystems - someone already
mentioned Btrfs, where it won't.

Oh certainly. That's why we have a VFS :) fsync for NFS will look quite different, too.


But I agree that your suggestion would make a superb default, for
filesystems which don't provide their own function.

Yep. That would immediately cover a bunch of filesystems.


It's not optimal even then.

Devices: On a software RAID, you ideally don't want to issue flushes
to all drives if your database did a 1 block commit entry. (But they
probably use O_DIRECT anyway, changing the rules again). But all that
can be optimised in generic VFS code eventually. It doesn't need
filesystem assistance in most cases.

My own idea is that we create a FLUSH command for blkdev request queues, to exist alongside READ, WRITE, and the current barrier implementation. Then FLUSH could be passed down through MD or DM.

Jeff


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/