Re: Update of file offset on write() etc. is non-atomic with I/O

From: Michael Kerrisk (man-pages)
Date: Fri Feb 21 2014 - 01:01:59 EST


On Thu, Feb 20, 2014 at 7:29 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> On Thu, Feb 20, 2014 at 06:15:15PM +0000, Zuckerman, Boris wrote:
>> Hi,
>>
>> You probably already considered that - sorry, if so...
>>
>> Instead of the mutex Windows use ExecutiveResource with shared and exclusive semantics. Readers serialize by taking the resource shared and writers take it exclusive. I have that implemented for Linux. Please, let me know if there is any interest!
>
> See include/linux/rwsem.h...
>
> Anyway, the really interesting question here is what does POSIX promise
> wrt lseek() vs. write(). What warranties are given there?

I suppose you are wondering about cases such as:

Process A Process B
write(): lseek()
perform I/O
update f_pos
update f_pos()

In my reading of POSIX, lseeek() and write() should be atomic w.r.t.
each other, and the above should not be allowed.

Here's the fulll list from POSIX.1-2008/SUSv4 Section XSI 2.9.7:

[[
2.9.7 Thread Interactions with Regular File Operations

All of the following functions shall be atomic with respect to each
other in the effects specified in
POSIX.1-2008 when they operate on regular files or symbolic links:

chmod( )
chown( )
close( )
creat( )
dup2( )
fchmod( )
fchmodat( )
fchown( )
fchownat( )
fcntl( )
fstat( )
fstatat( )
ftruncate( )
lchown( )
link( )
linkat( )
lseek( )
lstat( )
open( )
openat( )
pread( )
read( )
readlink( )
readlinkat( )
readv( )
pwrite( )
rename( )
renameat( )
stat( )
symlink( )
symlinkat( )
truncate( )
unlink( )
unlinkat( )
utime( )
utimensat( )
utimes( )
write( )
writev( )

If two threads each call one of these functions, each call shall
either see all of the specified effects
of the other call, or none of them.
]]

I'd bet that there's a bunch of violations to be found, but the
read/write f_pos case is one of the most egregious.

For example, I got curious about stat() versus rename(). If one
stat()s a directory() while a subdirectory is being renamed to a new
name within that directory, does the link count of the parent
directory ever change--that is, could stat() ever see a changed link
count in the middle of the rename()? My experiments suggest that it
can. I suppose it would have to be a very unusual application that
would be troubled by that, but it does appear to be a violation of
2.9.7.

Cheers,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/