Re: Writing more than 4096 bytes with O_SYNC flag does not persist all previously written data if system crashes

From: Andreas Dilger

Date: Mon Feb 23 2026 - 20:22:34 EST


On Feb 23, 2026, at 12:32, Theodore Tso <tytso@xxxxxxx> wrote:
>
> On Mon, Feb 23, 2026 at 01:46:54PM +0100, Alejandro Colomar wrote:
>> Hi Ted, Andreas,
>>
>>> The parenthetical comment in the second paragraph needs to be removed,
>>> since fsync specifices that all dirty information in the page cache
>>> will be flushed out.
>>
>> Would you mind checking the text in VERSIONS (since there's a reference
>> to it right next to the text you're proposing to remove)? I suspect it
>> will also need to be updated accordingly. I don't feel qualified to
>> touch that text by myself.
>
> The text in VERSIONS is not incorrect, in that it is talking about the
> distinction of O_SYNC and O_DSYNC in terms of which kinds of metadata
> will be persisted.
>
> However, the reason why all of this information regarding Synchronized
> I/O is in VERSIONS is describing the historic behaviour of Linux
> version 2.6.33 versus more modern versions of Linux. But 2.6.33 dates
> from February 24, 2010 --- 16 years ago. So it might be simpler if we
> simply dropped this kind of historical information. But if you do
> want to keep it, we should move the bulk of that inforamtion into
> O_SYNC and O_DSYNC.
>
> So maybe:
>
> O_DSYNC
> Write operations on the file will complete according to the
> requirements of synchronized I/O data integrity completion.

Should this be more specific to say "on a file descriptor opened with this flag" or "on this file descriptor", since the original thread was about whether *any* data written to the "file" would also be persisted...

> By the time write(2) (and similar) return, the output data has
> been transferred to the underlying hardware, along with any file
> metadata that would be required to retrieve that data.
>
> See VERSIONS for a description of how historial versions
> of the Linux kernes from 2010 behaved.
>
> O_SYNC Write operations on the file will complete according to the re‐
> quirements of synchronized I/O file integrity completion (by con‐
> trast with the synchronized I/O data integrity completion pro‐
> vided by O_DSYNC.)

Same, "on this file descriptor" or similar.

> By the time write(2) (or similar) returns, the output
> data and all file metadata associated inode for the
> opened file have been transferred to the underlying
> hardware.
>
> See VERSIONS for a description of how historial versions
> of the Linux kernes from 2010 behaved.
>
> VERSIONS
> Before Linux 2.6.33, Linux implemented only the O_SYNC flag for
> open(). However, when that flag was specified, most
> filesystems actually pro‐ vided the equivalent of synchronized
> I/O data integrity completion (i.e., O_SYNC was actually
> implemented as the equivalent of O_DSYNC).
>
> I'd suggest dropping everything else in VERSIONS, including the
> discussion of O_RSYNC. All of that is much more appropriate for a
> tutorial.

IMHO, agreed. If users are running really old versions of Linux then it
is likely they will have suitably old versions of the man pages as well.
There has to be some balance between highlighting potential interop issues
that an application developer might see vs. cluttering the text so that
readers are not clear _what_ the right semantics are.

Cheers, Andreas

> If you really want to keep all of that text, perhaps it could be moved
> into a synchronized-io man page in section 7. In that we can talk
> about the difference of fsync() and fdatasync(), which is interesting
> as a conceptual model, and conceptually it is similar to the O_SYNC
> and O_DSYNC. But the difference of what data will be written back
> (the data that was written in the file descriptor where the
> O_SYNC/O_DSYNC flag was set, eitehr via open or fcntl, versus all
> buffered data in the buffer cache). The synchronized-io man page
> could also have more of the information around O_DIRECT in one place.
>
>> If you'd write a patch, I'd appreciate that.
>
> Well, there's a question of what's the minimal change that is needed
> to fix out-and-out inaccuracies, and we can just delete some
> parenthetical comments.
>
> BTW, if we want to delete inaccurate information, I'd also suggest
> deleting the following text in the O_DIRECT section of the man page:
>
> A semantically similar (but deprecated) interface for block
> devices is described in raw(8).
>
> ----
>
> Then there's trying to rearrange the tutorial-style information for
> people who want to implement code which needs data persistence
> guarantees. That's quite a lot more work, and while I'm happy to
> review or assist someone to write that more expansive tutorial
> material, it's not something I'm willing to sign up to do.
>
> ----
>
> Finally, there are some philosophical questions about what the goals
> of the Linux kernel man pages --- how important is having historical
> information (for exmaple O_DIRECT has a "since 2.4.10", which is 25
> years ago --- really)? and how important is there to have tutorial
> infomation and where should that information should be organized in
> the man page.
>
> My personal opinion is that the primary priority of the Linux man page
> is to document the specification of the kernel interfaces that we
> expose to user space. Things like tutorial material and a descriptive
> of historical versions are of secondary importance.
>
> I'd also advocate dropping historical information for kernel versions
> which are older than say, 7 years. Curretly the oldest LTS kernel
> which is supported upstream is 5.10, which was originally released in
> 2020, and will EOL by end of 2026. The Linux kernel 5.0 was released
> on March 3, 2019, so using a 7 year lookback means that explanation
> about how the Linux kernel in 2.4.x, 2.6.y, 3.x, 4.x, etc. can be
> dropped from the man pages, since IMHO it will reduces a lot of noise
> that will likely confuse readers.
>
> But that's a call for Alex and the man pages project to make.
>
> Cheers,
>
> - Ted