Re: Proposal for "proper" durable fsync() and fdatasync()

From: Jamie Lokier
Date: Wed Feb 27 2008 - 09:17:18 EST


Jeff Garzik wrote:
> >It's not optimal even then.
> >
> > Devices: On a software RAID, you ideally don't want to issue flushes
> > to all drives if your database did a 1 block commit entry. (But they
> > probably use O_DIRECT anyway, changing the rules again). But all that
> > can be optimised in generic VFS code eventually. It doesn't need
> > filesystem assistance in most cases.
>
> My own idea is that we create a FLUSH command for blkdev request queues,
> to exist alongside READ, WRITE, and the current barrier implementation.
> Then FLUSH could be passed down through MD or DM.

I like your thought, and it has the benefit of being simple.

My thought is very similar, but with (hopefully not premature...)
optimisations:

- I would merge FLUSH with a preceding write in some cases,
converting to an FUA-write command. Probably the generic request
queue is the best place to detect and merge. This is so that
userspace filesystems (including guest VMs) and databases can do
journal commits with the same I/O sequence as in kernel
filesystems.

- I would create BARRIER too, so that a userspace API can ask for
this weaker form of fsync, which may improve throughput of
userspace journalling.

- I would include a sector range in FLUSH and BARRIER, for MD and DM
to flush _only_ relevant sub-devices. This may improve performance
for journalling both kernel and userspace filesystems, as journal
commits are often very small and hit one or two sub-devices in RAID.

- I would ask the nice MD and DM people to take tag-barriers rather
than flush-barriers on the input queue, converting to
tag-barriers, flush-barriers and independent FLUSH on the
sub-device queues according to sector ranges and subsequent
writes. It's not obvious, but my barrier proposal which started
this thread is designed to support an efficient inter-sub-device
flush-barrier when necessary, and single-sub-device tag-barrier
when possible.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/