[PATCH] inotify: hide internal kernel bits from fdinfo

From: Dave Hansen
Date: Mon Sep 21 2015 - 15:04:51 EST



From: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>

There was a report that my patch:

inotify: actually check for invalid bits in sys_inotify_add_watch()

broke CRIU.

The reason is that CRIU looks up raw flags in /proc/$pid/fdinfo/*
to figure out how to rebuild inotify watches and then passes those
flags directly back in to the inotify API. One of those flags
(FS_EVENT_ON_CHILD) is set in mark->mask, but is not part of the
inotify API. It is used inside the kernel to _implement_ inotify
but it is not and has never been part of the API.

My patch above ensured that we only allow bits which are part of
the API (IN_ALL_EVENTS). This broke CRIU.

FS_EVENT_ON_CHILD is really internal to the kernel. It is set
_anyway_ on all inotify marks. So, CRIU was really just trying
to set a bit that was already set.

This patch hides that bit from fdinfo. CRIU will not see the
bit, not try to set it, and should work as before. We should not
have been exposing this bit in the first place, so this is a good
patch independent of the CRIU problem.

Signed-off-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
Reported-by: Andrey Wagin <avagin@xxxxxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: Cyrill Gorcunov <gorcunov@xxxxxxxxxx>
Cc: xemul@xxxxxxxxxxxxx
Cc: Eric Paris <eparis@xxxxxxxxxx>
Cc: john@xxxxxxxxxxxxxxxxx
Cc: rlove@xxxxxxxxx
Cc: linux-kernel@xxxxxxxxxxxxxxx
---

b/fs/notify/fdinfo.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff -puN fs/notify/fdinfo.c~fdinfo-mask fs/notify/fdinfo.c
--- a/fs/notify/fdinfo.c~fdinfo-mask 2015-09-21 10:24:01.031864268 -0700
+++ b/fs/notify/fdinfo.c 2015-09-21 10:25:04.335723826 -0700
@@ -82,9 +82,16 @@ static void inotify_fdinfo(struct seq_fi
inode_mark = container_of(mark, struct inotify_inode_mark, fsn_mark);
inode = igrab(mark->inode);
if (inode) {
+ /*
+ * IN_ALL_EVENTS represents all of the mask bits
+ * that we expose to userspace. There is at
+ * least one bit (FS_EVENT_ON_CHILD) which is
+ * used only internally to the kernel.
+ */
+ u32 mask = mark->mask & IN_ALL_EVENTS;
seq_printf(m, "inotify wd:%x ino:%lx sdev:%x mask:%x ignored_mask:%x ",
inode_mark->wd, inode->i_ino, inode->i_sb->s_dev,
- mark->mask, mark->ignored_mask);
+ mask, mark->ignored_mask);
show_mark_fhandle(m, inode);
seq_putc(m, '\n');
iput(inode);
_
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/