Re: Proposal for "proper" durable fsync() and fdatasync()

From: Nick Piggin
Date: Tue Feb 26 2008 - 04:17:29 EST


On Tuesday 26 February 2008 18:59, Jamie Lokier wrote:
> Andrew Morton wrote:
> > On Tue, 26 Feb 2008 07:26:50 +0000 Jamie Lokier <jamie@xxxxxxxxxxxxx>
wrote:
> > > (It would be nicer if sync_file_range()
> > > took a vector of ranges for better elevator scheduling, but let's
> > > ignore that :-)
> >
> > Two passes:
> >
> > Pass 1: shove each of the segments into the queue with
> > SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE
> >
> > Pass 2: wait for them all to complete and return accumulated result
> > with SYNC_FILE_RANGE_WAIT_AFTER
>
> Thanks.
>
> Seems ok, though being able to cork the I/O until the last one would
> be a bonus (like TCP_MORE... SYNC_FILE_RANGE_MORE?)
>
> I'm imagining I'd omit the SYNC_FILE_RANGE_WAIT_BEFORE. Is there a
> reason why you have it there? The man page isn't very enlightening.


Yeah, sync_file_range has slightly unusual semantics and introduce
the new concept, "writeout", to userspace (does "writeout" include
"in drive cache"? the kernel doesn't think so, but the only way to
make sync_file_range "safe" is if you do consider it writeout).

If it makes it any easier to understand, we can add in
SYNC_FILE_ASYNC, SYNC_FILE_SYNC parts that just deal with
safe/unsafe and sync/async semantics that is part of the normal
POSIX api.

Anyway, the idea of making fsync/fdatasync etc. safe by default is
a good idea IMO, and is a bad bug that we don't do that :(

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/