Re: [GIT PULL] gfs2 fix

From: Andreas Gruenbacher
Date: Thu Apr 28 2022 - 09:27:21 EST


On Thu, Apr 28, 2022 at 2:00 AM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Wed, Apr 27, 2022 at 3:20 PM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > So I really think
> >
> > (a) you are mis-reading the standard by attributing too strong logic
> > to paperwork that is English prose and not so exact
> >
> > (b) documenting Linux as not doing what you are mis-reading it for is
> > only encouraging others to mis-read it too
> >
> > The whole "arbitrary writes have to be all-or-nothing wrt all other
> > system calls" is simply not realistic, and has never been. Not just
> > not in Linux, but in *ANY* operating system that POSIX was meant to
> > describe.
>
> Side note: a lot of those "atomic" things in that documentation have
> come from a history of signal handling atomicity issues, and from all
> the issues people had with (a) user-space threading implementations
> and (b) emulation layers from non-Unixy environments.
>
> So when they say that things like "rename()" has to be all-or-nothing,
> it's to clarify that you can't emulate it as a "link and delete
> original" kind of operation (which old UNIX *did* do) and claim to be
> POSIX.
>
> Because while the end result of rename() and link()+unlink()might be
> similar, people did rely on that whole "use rename as a way to create
> an atomic marker in the filesystem" (which is a very traditional UNIX
> pattern).
>
> So "rename()" has to be atomic, and the legacy behavior of link+unlink
> is not valid in POSIX.
>
> Similarly, you can't implement "pread()" as a "lseek+read+lseek back",
> because that doesn't work if somebody else is doing another "pread()"
> on the same file descriptor concurrently.
>
> Again, people *did* implement exactly those kinds of implementations
> of "pread()", and yes, they were broken for both signals and for
> threading.
>
> So there's "atomicity" and then there is "atomicity".
>
> That "all or nothing" can be a very practical thing to describe
> *roughly* how it must work on a higher level, or it can be a
> theoretical "transactional" thing that works literally like a database
> where the operation happens in full and you must not see any
> intermediate state.
>
> And no, "write()" and friends have never ever been about some
> transactional operation where you can't see how the file grows as it
> is being written to. That kind of atomicity has simply never existed,
> not even in theory.
>
> So when you see POSIX saying that a "read()" system call is "atomic",
> you should *not* see it as a transaction thing, but see it in the
> historical context of "people used to do threading libraries in user
> space, and since they didn't want a big read() to block all other
> threads, they'd split it up into many smaller reads and now another
> thread *also* doing 'read()' system calls would see the data it read
> being not one contiguous region, but multiple regions where the file
> position changed in the middle".
>
> Similarly, a "read()" system call will not be interrupted by a signal
> in the middle, where the signal handler would do a "lseek()" or
> another "read()", and now the original "read()" data suddenly is
> affected.
>
> That's why things like that whole "f_pos is atomic" is a big deal.
>
> Because there literally were threading libraries (and badly emulated
> environments) where that *WASN'T* the case, and _that_ is why POSIX
> then talks about it.
>
> So think of POSIX not as some hard set of "this is exactly how things
> work and we describe every detail".
>
> Instead, treat it a bit like historians treat Herodotus - interpreting
> his histories by taking the issues of the time into account. POSIX is
> trying to clarify and document the problems of the time it was
> written, and taking other things for granted.

Okay fine, thanks for elaborating.

Would you mind pulling the following fix to straighten this out?

The data corruption we've been getting unfortunately didn't have to do
with lock contention (we already knew that); it still occurs. I'm
running out of ideas on what to try there.

Thanks a lot,
Andreas

--

The following changes since commit 4fad37d595b9d9a2996467d780cb2e7a1b08b2c0:

Merge tag 'gfs2-v5.18-rc4-fix' of
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
(2022-04-26 11:17:18 -0700)

are available in the Git repository at:

https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git
tags/gfs2-v5.18-rc4-fix2

for you to fetch changes up to 296abc0d91d8b65d42224dd33452ace14491ad08:

gfs2: No short reads or writes upon glock contention (2022-04-28
15:14:48 +0200)

----------------------------------------------------------------
gfs2 fix

- No short reads or writes upon glock contention

----------------------------------------------------------------
Andreas Gruenbacher (1):
gfs2: No short reads or writes upon glock contention

fs/gfs2/file.c | 4 ----
1 file changed, 4 deletions(-)