Re: vfs: Add MS_FLUSHONFSYNC mount flag

From: Eric Sandeen
Date: Sun Feb 22 2009 - 15:53:19 EST


Pavel Machek wrote:
> On Thu 2009-02-12 21:23:36, Theodore Tso wrote:
>> On Thu, Feb 12, 2009 at 03:30:10PM -0600, Eric Sandeen wrote:
>>>> Yes, but OTOH we should give sysadmin a possibility to enable / disable
>>>> it on just some partitions. I don't see a reasonable use for that but people
>>>> tend to do strange things ;) and here isn't probably a strong reason to not
>>>> allow them.
>>>>
>>> But nobody has asked for that, have they? So why offer it up a this point?
>>>
>>> They could use LD_PRELOAD to make fsync a no-op if they really don't
>>> care for it, I guess... though that's not easily per-fs either.
>> Actually, Bart Samwel at FOSDEM talked to me and asked for something
>> similar --- what we came up which meant his request while still being
>> standards-compliant was a per-process personality flag which had three
>> options:
>>
>> *) Always honor fsync() calls (the default)
>> *) Never honor fsync() calls
>> *) Only honor fsync() calls if a global "honor fsync" flag
>> (which would be manipulated by the laptop mode scripts)
>> is set.
>>
>> The flag would be reset to the default across a setuid exec, but would
>> otherwise be inherited across fork()'s. It might be possible to
>> set/get the flag via a /proc interface.
>>
>> The basic idea is that laptop systems where the system administrator
>> wants longer battery life (and trusts the battery not to suddenly give
>> out) more than they care about fsync() guarantees can set up a pam
>> library which sets the flag for at login time so that all of the
>> user's processes can be set up not to honor fsync() calls; however,
>> all of the system daemons would still function normally.
>
> Sounds like posix violation to
> me... '/sys/fsync_does_not_really_sync'?
>
> Perhaps it is better done at glibc level? Environment variables
> already mostly have semantics you want.....
>
> Pavel

One other thing that may be worth bringing up (just to muddy the waters
more) is OSX's handling of this stuff.

>From the fsync(2) manpage:

> Note that while fsync() will flush all data from the host to the
> drive (i.e. the "permanent storage device"), the drive itself may not
> physically write the data to the platters for quite some time and it
> may be written in an out-of-order sequence.
>
> Specifically, if the drive loses power or the OS crashes, the appli-
> cation may find that only some or none of their data was written.
> The disk drive may also re-order the data so that later writes may be
> present, while earlier writes are not.
>
> This is not a theoretical edge case. This scenario is easily repro-
> duced with real world workloads and drive power failures.
>
> For applications that require tighter guarantees about the integrity
> of their data, Mac OS X provides the F_FULLFSYNC fcntl. The F_FULLF-
> SYNC fcntl asks the drive to flush all buffered data to permanent
> storage. Applications, such as databases, that require a strict
> ordering of writes should use F_FULLFSYNC to ensure that their data
> is written in the order they expect. Please see fcntl(2) for more
> detail.

and from fcntl(2)

> F_FULLFSYNC Does the same thing as fsync(2) then asks the drive to
> flush all buffered data to the permanent storage
> device (arg is ignored). This is currently imple-
> mented on HFS, MS-DOS (FAT), and Universal Disk Format
> (UDF) file systems. The operation may take quite a
> while to complete. Certain FireWire drives have also
> been known to ignore the request to flush their
> buffered data.

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/