Re: Linux 2.6.29

From: Kyle Moffett
Date: Wed Mar 25 2009 - 22:46:38 EST


Apologies for the HTML email, resent in ASCII below:

> On Wed, Mar 25, 2009 at 10:10 PM, Matthew Garrett <mjg59@xxxxxxxxxxxxx> wrote:
>>
>> If fsync() means anything other than "Get
>> my data on disk and then return" then we're breaking guarantees to
>> applications. The problem is that you're insisting that the only way
>> applications can ensure that their requests occur in order is to use
>> fsync(), which will achieve that but also provides guarantees above and
>> beyond what the majority of applications want.
>>
>> I've done some benchmarking now and I'm actually fairly happy with the
>> behaviour of ext4 now - it seems that the real world impact of doing the
>> block allocation at rename time isn't that significant, and if that's
>> the only practical way to ensure ordering guarantees in ext4 then fine.
>> But given that, I don't think there's any reason to try to convince
>> application authors to use fsync() more.
>
> Really, the problem is the filesystem interfaces are incomplete. ÂThere are plenty of ways to specify a "FLUSH CACHE"-type command for an individual file or for the whole filesystem, but there aren't really any ways for programs to specify barriers (either whole-blockdev or per-LBA-range). ÂAn fsync() implies you want to *wait* for the data... there's no way to ask it all to be queued with some ordering constraints.
> Perhaps we ought to add a couple extra open flags, O_BARRIER_BEFORE and O_BARRIER_AFTER, and rename3(), etc functions that take flags arguments?
> Or maybe a new set of syscalls like barrier(file1, file2) and fbarrier(fd1, fd2), which cause all pending changes (perhaps limit to this process?) to the file at fd1 to occur before any successive changes (again limited to this process?) to the file at fd2.
> It seems that rename(oldfile, newfile) with an already-existing newfile should automatically imply barrier(oldfile, newfile) before it occurs, simply because so many programs rely on that.
> In the cross-filesystem case, the fbarrier() might simply fsync(fd1), since that would provide the equivalent guarantee, albeit with possibly significant performance penalties. ÂI can't think of any easy way to prevent one filesystem from syncing writes to a particular file until another filesystem has finished an asynchronous fsync() call. ÂPerhaps a half-way solution would be to asynchronously fsync(fd1) and simply block the next write()/ioctl()/etc on fd2 until the async fsync returns.
> Are there other ideas for useful barrier()-generating file APIs?
> Cheers,
> Kyle Moffett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/