Re: the "read" syscall sees partial effects of the "write" syscall

From: Jan Kara
Date: Fri Sep 18 2020 - 09:13:24 EST


On Fri 18-09-20 08:25:28, Mikulas Patocka wrote:
> I'd like to ask about this problem: when we write to a file, the kernel
> takes the write inode lock. When we read from a file, no lock is taken -
> thus the read syscall can read data that are halfway modified by the write
> syscall.
>
> The standard specifies the effects of the write syscall are atomic - see
> this:
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_07

Yes, but no Linux filesystem (except for XFS AFAIK) follows the POSIX spec
in this regard. Mostly because the mixed read-write performance sucks when
you follow it (not that it would absolutely have to suck - you can use
clever locking with range locks but nobody does it currently). In practice,
the read-write atomicity works on Linux only on per-page basis for buffered
IO.

Honza

--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR