[PATCH 2/3] inotify: add inotify_update_watch() syscall

From: David Herrmann
Date: Thu Sep 03 2015 - 11:11:29 EST


The current inotify API only provides a single function to add *and*
modify watch descriptors. There is no way to perform either operation
explicitly, but the kernel always automatically chooses what to do. If
the watch-descriptor exists, it is updated, otherwise a new descriptor is
allocated. This has quite nasty side-effects:

Imagine the case where an application monitors two independent files A
and B with two independent watch descriptors. If you now want to *change*
the watch-mask of A, you have to use inotify_add_watch(fd, "A", new_mask).
However, this might race with a file-system operation that links B over A,
thus this call to inotify_add_watch() will affect the watch-descriptor of
B. However, this is usually not what the caller wants, as the watch-masks
of A and B can be disjoint, and as such an unwanted update of B might
cause event loss. Hence, a call like inotify_update_watch() is needed,
which explicitly takes the watch-descriptor to modify. In this case, it
would still only update the watch-descriptor of A, even though the path
to A changed.

The underlying issue here is the automatism of inotify_add_watch(), which
does not allow the caller to distinguish an update operation from an ADD
operation. This race could be solved with a simple IN_EXCL (or IN_CREATE)
flag, which would cause inotify_add_watch() to *never* update existing
watch-descriptors, but fail with EEXIST instead. However, this still
prevents the caller from *updating* the flags of an explicitly passed
watch-descriptor. Furthermore, the fact that inotify objects identify
*INODES*, but the API takes *PATHS* calls for races. Therefore, we really
need an explicit update operation to allow user-space to modify watch
descriptors without having to re-create them and thus invalidating their
cache.

This patch implements inotify_update_watch() to extend the inotify API
with a way to explicity modify watch-descriptors, instead of going via
the file-system path-API of inotify_add_watch().

SYNOPSIS
#include <sys/inotify.h>

int inotify_update_watch(int fd, __u32 wd, __u32 mask);

DESCRIPTION
inotify_update_watch() modifies an existing inotify watch descriptor,
specified by 'wd', which was previously added via
inotify_add_watch(2). It updates the mask of events to be monitored
via this watch descriptor according to the event mask specified by
'mask'. If IN_MASK_ADD is passed, 'mask' is added to the existing set
of flags on this watch descriptor, otherwise the existing mask is
replaced by the new mask. See inotify(7) for a description of the
further bits allowed in 'mask'.

Flags that modify the file lookup behavior of inotify_add_watch(2)
(IN_ONLYDIR, IN_DONT_FOLLOW) cannot be passed to
inotify_update_watch(). They will be rejected with EINVAL.

RETURN VALUE
On success, 0 is returned. On error, -1 is returned, and errno is
set appropriately.

ERRORS
EBADF 'fd' is not a valid file descriptor.

EINVAL 'fd' is not an inotify file descriptor; or 'mask' contains
invalid or unsupported flags.

ENXIO 'wd' is not a valid watch descriptor on this inotify
instance.

CONFORMING TO
This system call is Linux-specific.

SEE ALSO
inotify(7), inotify_init(2), inotify_add_watch(2), inotify_rm_watch(2)

Signed-off-by: David Herrmann <dh.herrmann@xxxxxxxxx>
---
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
fs/notify/inotify/inotify_user.c | 37 ++++++++++++++++++++++++++++++++++
include/linux/syscalls.h | 1 +
kernel/sys_ni.c | 1 +
5 files changed, 41 insertions(+)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index ef8187f..598b6cc 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -365,3 +365,4 @@
356 i386 memfd_create sys_memfd_create
357 i386 bpf sys_bpf
358 i386 execveat sys_execveat stub32_execveat
+359 i386 inotify_update_watch sys_inotify_update_watch
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 9ef32d5..883a02e 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -329,6 +329,7 @@
320 common kexec_file_load sys_kexec_file_load
321 common bpf sys_bpf
322 64 execveat stub_execveat
+323 common inotify_update_watch sys_inotify_update_watch

#
# x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 5a39ae8..1df7312 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -733,6 +733,43 @@ fput_and_out:
return ret;
}

+SYSCALL_DEFINE3(inotify_update_watch, int, fd, __s32, wd, __u32, mask)
+{
+ struct inotify_inode_mark *mark;
+ struct fsnotify_group *group;
+ struct fd f;
+ int ret = 0;
+
+ /* disallow unknown flags and flags specific to inode lookup */
+ if (unlikely(mask & (IN_DONT_FOLLOW |
+ IN_ONLYDIR |
+ ~ALL_INOTIFY_BITS)))
+ return -EINVAL;
+
+ f = fdget(fd);
+ if (unlikely(!f.file))
+ return -EBADF;
+ if (unlikely(f.file->f_op != &inotify_fops)) {
+ ret = -EINVAL;
+ goto exit;
+ }
+
+ group = f.file->private_data;
+ mark = inotify_idr_find(group, wd);
+ if (unlikely(!mark)) {
+ ret = -ENXIO;
+ goto exit;
+ }
+
+ mutex_lock(&group->mark_mutex);
+ inotify_update_existing_watch(&mark->fsn_mark, mask);
+ mutex_unlock(&group->mark_mutex);
+
+exit:
+ fdput(f);
+ return ret;
+}
+
SYSCALL_DEFINE2(inotify_rm_watch, int, fd, __s32, wd)
{
struct fsnotify_group *group;
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index b45c45b..40701b0 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -744,6 +744,7 @@ asmlinkage long sys_inotify_init(void);
asmlinkage long sys_inotify_init1(int flags);
asmlinkage long sys_inotify_add_watch(int fd, const char __user *path,
u32 mask);
+asmlinkage long sys_inotify_update_watch(int fd, __s32 wd, __u32 mask);
asmlinkage long sys_inotify_rm_watch(int fd, __s32 wd);

asmlinkage long sys_spu_run(int fd, __u32 __user *unpc,
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 7995ef5..b556a33d 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -114,6 +114,7 @@ cond_syscall(compat_sys_socketcall);
cond_syscall(sys_inotify_init);
cond_syscall(sys_inotify_init1);
cond_syscall(sys_inotify_add_watch);
+cond_syscall(sys_inotify_update_watch);
cond_syscall(sys_inotify_rm_watch);
cond_syscall(sys_migrate_pages);
cond_syscall(sys_move_pages);
--
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/