Re: [GIT PULL] gfs2 fix
From: Andreas Gruenbacher
Date: Wed Apr 27 2022 - 15:43:51 EST
On Wed, Apr 27, 2022 at 7:13 PM Linus Torvalds
> On Wed, Apr 27, 2022 at 5:29 AM Andreas Gruenbacher <agruenba@xxxxxxxxxx> wrote:
> > Regular (buffered) reads and writes are expected to be atomic with
> > respect to each other.
> Linux has actually never honored that completely broken POSIX
> requirement, although I think some filesystems (notably XFS) have
Okay, I can happily live with that.
I wonder if this could be documented in the read and write manual
pages. Or would that be asking too much?
> It's a completely broken concept. It's not possible to honor atomicity
> with mmap(), and nobody has *ever* cared.
> And it causes huge amounts of problems and basically makes any sane
> locking entirely impossible.
> The fact that you literally broke regular file writes in ways that are
> incompatible with (much MUCH more important) POSIX file behavior to
> try to get that broken read/write atomicity is only one example among
> many for why that alleged rule just has to be ignored.
> We do honor the PIPE_BUF atomicity on pipes, which is a completely
> different kind of atomicity wrt read/write, and doesn't have the
> fundamental issues that arbitrary regular file reads/writes have.
> There is absolutely no sane way to do that file atomicity wrt
> arbitrary read/write calls (*), and you shouldn't even try.
> That rule needs to be forgotten about, and buried 6ft deep.
> So please scrub any mention of that idiotic rule from documentation,
> and from your brain.
> And please don't break "partial write means disk full or IO error" due
> to trying to follow this broken rule, which was apparently what you
> Because that "regular file read/write is done in full" is a *MUCH*
> more important rule, and there is a shitton of applications that most
> definitely depend on *that* rule.
> Just go to debian code search, and look for
> "if (write("
> and you'll get thousands of hits, and on the first page of hits 9 out
> of 10 of the hits are literally about that "partial write is an
> error", eg code like this:
> if (write(fd,&triple,sizeof(triple)) != sizeof(triple))
> from libreoffice.
> (*) Yeah, if you never care about performance(**) of mixed read/write,
> and you don't care about mmap, and you have no other locking issues,
> it's certainly possible. The old rule came about from original UNIX
> literally taking an inode lock around the whole IO access, because
> that was simple, and back in the days you'd never have multiple
> concurrent readers/writers anyway.
> (**) It's also instructive how O_DIRECT literally throws that rule
> away, and then some direct-IO people said for years that direct-IO is
> superior and used this as one of their arguments. Probably the same
> people who thought that "oh, don't report partial success", because we
> can't deal with it.
Thanks a lot,