[PATCH v2] vfs: add RWF_NOAPPEND flag for pwritev2

From: Rich Felker
Date: Mon Aug 31 2020 - 11:32:40 EST


The pwrite function, originally defined by POSIX (thus the "p"), is
defined to ignore O_APPEND and write at the offset passed as its
argument. However, historically Linux honored O_APPEND if set and
ignored the offset. This cannot be changed due to stability policy,
but is documented in the man page as a bug.

Now that there's a pwritev2 syscall providing a superset of the pwrite
functionality that has a flags argument, the conforming behavior can
be offered to userspace via a new flag. Since pwritev2 checks flag
validity (in kiocb_set_rw_flags) and reports unknown ones with
EOPNOTSUPP, callers will not get wrong behavior on old kernels that
don't support the new flag; the error is reported and the caller can
decide how to handle it.

Signed-off-by: Rich Felker <dalias@xxxxxxxx>
---

Changes in v2: I've added a check to ensure that RWF_NOAPPEND does not
override O_APPEND for S_APPEND (chattr +a) inodes, and fixed conflicts
with 1752f0adea98ef85, which optimized kiocb_set_rw_flags to work with
a local copy of flags. Unfortunately the same optimization does not
work for RWF_NOAPPEND since it needs to remove flags from the original
set at function entry.

If desired, I could further change this so that kiocb_flags is
initialized to ki->ki_flags, with assignment-back in place of |= at
the end of the function. This would allow the same local variable
pattern in the RWF_NOAPPEND code path, which might be more elegant,
but I'm not sure if the emitted code would improve or get worse.


include/linux/fs.h | 7 +++++++
include/uapi/linux/fs.h | 5 ++++-
2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7519ae003a08..924e17ac8e7e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3321,6 +3321,8 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags)
return 0;
if (unlikely(flags & ~RWF_SUPPORTED))
return -EOPNOTSUPP;
+ if (unlikely((flags & RWF_APPEND) && (flags & RWF_NOAPPEND)))
+ return -EINVAL;

if (flags & RWF_NOWAIT) {
if (!(ki->ki_filp->f_mode & FMODE_NOWAIT))
@@ -3335,6 +3337,11 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags)
kiocb_flags |= (IOCB_DSYNC | IOCB_SYNC);
if (flags & RWF_APPEND)
kiocb_flags |= IOCB_APPEND;
+ if ((flags & RWF_NOAPPEND) && (ki->ki_flags & IOCB_APPEND)) {
+ if (IS_APPEND(file_inode(ki->ki_filp)))
+ return -EPERM;
+ ki->ki_flags &= ~IOCB_APPEND;
+ }

ki->ki_flags |= kiocb_flags;
return 0;
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index f44eb0a04afd..d5e54e0742cf 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -300,8 +300,11 @@ typedef int __bitwise __kernel_rwf_t;
/* per-IO O_APPEND */
#define RWF_APPEND ((__force __kernel_rwf_t)0x00000010)

+/* per-IO negation of O_APPEND */
+#define RWF_NOAPPEND ((__force __kernel_rwf_t)0x00000020)
+
/* mask of flags supported by the kernel */
#define RWF_SUPPORTED (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\
- RWF_APPEND)
+ RWF_APPEND | RWF_NOAPPEND)

#endif /* _UAPI_LINUX_FS_H */
--
2.21.0