Re: Writing more than 4096 bytes with O_SYNC flag does not persist all previously written data if system crashes
From: Alejandro Colomar
Date: Tue Mar 03 2026 - 08:25:15 EST
Hi Ted,
On 2026-02-23T14:32:38-0500, Theodore Tso wrote:
[...]
> The text in VERSIONS is not incorrect, in that it is talking about the
> distinction of O_SYNC and O_DSYNC in terms of which kinds of metadata
> will be persisted.
>
> However, the reason why all of this information regarding Synchronized
> I/O is in VERSIONS is describing the historic behaviour of Linux
> version 2.6.33 versus more modern versions of Linux. But 2.6.33 dates
> from February 24, 2010 --- 16 years ago. So it might be simpler if we
> simply dropped this kind of historical information.
I prefer keeping it, but I agree with moving it to a place where it
doesn't distract (maybe even a separate page).
> But if you do
> want to keep it, we should move the bulk of that inforamtion into
> O_SYNC and O_DSYNC.
>
> So maybe:
>
> O_DSYNC
> Write operations on the file will complete according to the re‐
> quirements of synchronized I/O data integrity completion.
>
> By the time write(2) (and similar) return, the output data has
> been transferred to the underlying hardware, along with any file
> metadata that would be required to retrieve that data.
>
> See VERSIONS for a description of how historial versions
> of the Linux kernes from 2010 behaved.
>
> O_SYNC Write operations on the file will complete according to the re‐
> quirements of synchronized I/O file integrity completion (by con‐
> trast with the synchronized I/O data integrity completion pro‐
> vided by O_DSYNC.)
>
> By the time write(2) (or similar) returns, the output
> data and all file metadata associated inode for the
> opened file have been transferred to the underlying
> hardware.
>
> See VERSIONS for a description of how historial versions
> of the Linux kernes from 2010 behaved.
LGTM.
>
> VERSIONS
> Before Linux 2.6.33, Linux implemented only the O_SYNC flag for
> open(). However, when that flag was specified, most
> filesystems actually pro‐ vided the equivalent of synchronized
> I/O data integrity completion (i.e., O_SYNC was actually
> implemented as the equivalent of O_DSYNC).
>
> I'd suggest dropping everything else in VERSIONS, including the
> discussion of O_RSYNC. All of that is much more appropriate for a
> tutorial.
How about having an O_RSYNC(2const) manual page that talks in detail
about it?
>
> If you really want to keep all of that text, perhaps it could be moved
> into a synchronized-io man page in section 7.
Yes, a syncronized-io(7) page would make sense.
> In that we can talk
> about the difference of fsync() and fdatasync(), which is interesting
> as a conceptual model, and conceptually it is similar to the O_SYNC
> and O_DSYNC. But the difference of what data will be written back
> (the data that was written in the file descriptor where the
> O_SYNC/O_DSYNC flag was set, eitehr via open or fcntl, versus all
> buffered data in the buffer cache). The synchronized-io man page
> could also have more of the information around O_DIRECT in one place.
I like the idea of a chapter 7 manual page, or separate 2const pages for
each different macro. Whatever you consider more useful/readable.
>
> > If you'd write a patch, I'd appreciate that.
>
> Well, there's a question of what's the minimal change that is needed
> to fix out-and-out inaccuracies, and we can just delete some
> parenthetical comments.
Yup; I strongly prefer many minimal patches. If you (or anyone) start
by removing parentheticals that are unnecessary or incorrect, that'd be
good.
I would do that, but I wouldn't be able to write the commit messages, or
decide how to group them. I'd need someone expert in those APIs to
write the patches. I can then amend them editorially if they have any
minor issues.
> BTW, if we want to delete inaccurate information, I'd also suggest
> deleting the following text in the O_DIRECT section of the man page:
>
> A semantically similar (but deprecated) interface for block
> devices is described in raw(8).
>
> ----
>
> Then there's trying to rearrange the tutorial-style information for
> people who want to implement code which needs data persistence
> guarantees. That's quite a lot more work, and while I'm happy to
> review or assist someone to write that more expansive tutorial
> material, it's not something I'm willing to sign up to do.
Okay. While I can't do the removal of inaccurate text, I can reorganize
correct text. If you do the former, I can do this afterwards. I'll CC
you in such patches.
> ----
>
> Finally, there are some philosophical questions about what the goals
> of the Linux kernel man pages --- how important is having historical
> information (for exmaple O_DIRECT has a "since 2.4.10", which is 25
> years ago --- really)? and how important is there to have tutorial
> infomation and where should that information should be organized in
> the man page.
Michael Kerrisk wanted to keep everything after Linux 2.6. Moving it to
HISTORY, and reducing less important details, is appropriate, but
removing it all is not so much.
I more or less keep that guideline, although I'm slightly more prone to
removals, but not too much.
> My personal opinion is that the primary priority of the Linux man page
> is to document the specification of the kernel interfaces that we
> expose to user space. Things like tutorial material and a descriptive
> of historical versions are of secondary importance.
Yup. I've been moving a lot of text to separate pages or HISTORY
sections, or removing unnecessary details.
> I'd also advocate dropping historical information for kernel versions
> which are older than say, 7 years. Curretly the oldest LTS kernel
> which is supported upstream is 5.10, which was originally released in
> 2020, and will EOL by end of 2026. The Linux kernel 5.0 was released
> on March 3, 2019, so using a 7 year lookback means that explanation
> about how the Linux kernel in 2.4.x, 2.6.y, 3.x, 4.x, etc. can be
> dropped from the man pages, since IMHO it will reduces a lot of noise
> that will likely confuse readers.
>
> But that's a call for Alex and the man pages project to make.
Have a lovely day!
Alex
>
> Cheers,
>
> - Ted
--
<https://www.alejandro-colomar.es>
Attachment:
signature.asc
Description: PGP signature