[RFC][PATCH 2/2] fsnotify: handle permission events without holding fsnotify_mark_srcu[0]

From: Amir Goldstein
Date: Mon Nov 14 2016 - 06:49:37 EST


Handling fanotify events does not entail dereferencing fsnotify_mark
beyond the point of fanotify_should_send_event().

For the case of permission events, which may block indefinitely,
return -EAGAIN and then fsnotify() will call handle_event() again
without a reference to the mark.

Without a reference to the mark, there is no need to call
handle_event() under fsnotify_mark_srcu[0] read side lock,
so we drop fsnotify_mark_srcu[0] while handling the event
and grab it back before continuing to the next mark.

After this change, a blocking permission event will no longer
block closing of any file descriptors of 0 priority groups,
i.e: inotify and fanotify groups of class FAN_CLASS_NOTIF.

Reported-by: Miklos Szeredi <miklos@xxxxxxxxxx>
Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx>
---
fs/notify/fanotify/fanotify.c | 15 +++++++++++----
fs/notify/fsnotify.c | 23 +++++++++++++++++++++++
2 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index e0e5f7c..c7689ad 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -176,7 +176,7 @@ init: __maybe_unused
static int fanotify_handle_event(struct fsnotify_group *group,
struct inode *inode,
struct fsnotify_mark *inode_mark,
- struct fsnotify_mark *fanotify_mark,
+ struct fsnotify_mark *vfsmnt_mark,
u32 mask, void *data, int data_type,
const unsigned char *file_name, u32 cookie)
{
@@ -195,9 +195,16 @@ static int fanotify_handle_event(struct fsnotify_group *group,
BUILD_BUG_ON(FAN_ACCESS_PERM != FS_ACCESS_PERM);
BUILD_BUG_ON(FAN_ONDIR != FS_ISDIR);

- if (!fanotify_should_send_event(inode_mark, fanotify_mark, mask, data,
- data_type))
- return 0;
+ if (inode_mark || vfsmnt_mark) {
+ if (!fanotify_should_send_event(inode_mark, vfsmnt_mark, mask,
+ data, data_type))
+ return 0;
+#ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
+ /* Ask to be called again without a reference to mark */
+ if (mask & FAN_ALL_PERM_EVENTS)
+ return -EAGAIN;
+#endif
+ }

pr_debug("%s: group=%p inode=%p mask=%x\n", __func__, group, inode,
mask);
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index af5c523a..5b9a248 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -291,6 +291,29 @@ int fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is,
ret = send_to_group(to_tell, inode_mark, vfsmount_mark, mask,
data, data_is, cookie, file_name);

+ /*
+ * If handle_event() is going to block, we call it again
+ * witout holding fsnotify_mark_srcu[0], which is protecting
+ * the low priority mark lists.
+ * We are still holding fsnotify_mark_srcu[1], which
+ * is protecting the high priority marks in the first half
+ * of the mark list, which is where we are at.
+ */
+ if (group->priority > 0 && ret == -EAGAIN) {
+ srcu_read_unlock(&fsnotify_mark_srcu[0], idx);
+
+ ret = group->ops->handle_event(group, to_tell,
+ NULL, NULL,
+ mask, data, data_is,
+ file_name, cookie);
+
+ /*
+ * We need to hold fsnotify_mark_srcu[0], because
+ * next mark may be low priority.
+ */
+ idx = srcu_read_lock(&fsnotify_mark_srcu[0]);
+ }
+
if (ret && (mask & ALL_FSNOTIFY_PERM_EVENTS))
goto out;

--
2.7.4