Re: [syzbot] Re: [syzbot] [ext4?] KASAN: slab-use-after-free Read in fsnotify

From: Amir Goldstein
Date: Tue Apr 16 2024 - 12:27:44 EST


On Tue, Apr 16, 2024 at 4:22 PM Jan Kara <jack@xxxxxxx> wrote:
>
> On Mon 15-04-24 17:47:45, Amir Goldstein wrote:
> > On Mon, Apr 15, 2024 at 5:03 PM Jan Kara <jack@xxxxxxx> wrote:
> > >
> > > On Sat 13-04-24 12:32:32, Amir Goldstein wrote:
> > > > On Sat, Apr 13, 2024 at 11:45 AM Hillf Danton <hdanton@sinacom> wrote:
> > > > > On Fri, 12 Apr 2024 23:42:19 -0700 Amir Goldstein
> > > > > > On Sat, Apr 13, 2024 at 4:41=E2=80=AFAM Hillf Danton <hdanton@xxxxxxxx> wrote:
> > > > > > > On Thu, 11 Apr 2024 01:11:20 -0700
> > > > > > > > syzbot found the following issue on:
> > > > > > > >
> > > > > > > > HEAD commit: 6ebf211bb11d Add linux-next specific files for 20240410
> > > > > > > > git tree: linux-next
> > > > > > > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=3D1621af9d180000
> > > > > > >
> > > > > > > #syz test https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 6ebf211bb11d
> > > > > > >
> > > > > > > --- x/fs/notify/fsnotify.c
> > > > > > > +++ y/fs/notify/fsnotify.c
> > > > > > > @@ -101,8 +101,8 @@ void fsnotify_sb_delete(struct super_blo
> > > > > > > wait_var_event(fsnotify_sb_watched_objects(sb),
> > > > > > > !atomic_long_read(fsnotify_sb_watched_objects(sb)));
> > > > > > > WARN_ON(fsnotify_sb_has_priority_watchers(sb, FSNOTIFY_PRIO_CONTENT));
> > > > > > > - WARN_ON(fsnotify_sb_has_priority_watchers(sb,
> > > > > > > - FSNOTIFY_PRIO_PRE_CONTENT));
> > > > > > > + WARN_ON(fsnotify_sb_has_priority_watchers(sb, FSNOTIFY_PRIO_PRE_CONTENT));
> > > > > > > + synchronize_srcu(&fsnotify_mark_srcu);
> > > > > > > kfree(sbinfo);
> > > > > > > }
> > > > > > >
> > > > > > > @@ -499,7 +499,7 @@ int fsnotify(__u32 mask, const void *dat
> > > > > > > {
> > > > > > > const struct path *path =3D fsnotify_data_path(data, data_type);
> > > > > > > struct super_block *sb =3D fsnotify_data_sb(data, data_type);
> > > > > > > - struct fsnotify_sb_info *sbinfo =3D fsnotify_sb_info(sb);
> > > > > > > + struct fsnotify_sb_info *sbinfo;
> > > > > > > struct fsnotify_iter_info iter_info = {};
> > > > > > > struct mount *mnt =3D NULL;
> > > > > > > struct inode *inode2 =3D NULL;
> > > > > > > @@ -529,6 +529,8 @@ int fsnotify(__u32 mask, const void *dat
> > > > > > > inode2_type =3D FSNOTIFY_ITER_TYPE_PARENT;
> > > > > > > }
> > > > > > >
> > > > > > > + iter_info.srcu_idx =3D srcu_read_lock(&fsnotify_mark_srcu);
> > > > > > > + sbinfo =3D fsnotify_sb_info(sb);
> > > > > > > /*
> > > > > > > * Optimization: srcu_read_lock() has a memory barrier which can
> > > > > > > * be expensive. It protects walking the *_fsnotify_marks lists.
> > > > > >
> > > > > >
> > > > > > See comment above. This kills the optimization.
> > > > > > It is not worth letting all the fsnotify hooks suffer the consequence
> > > > > > for the edge case of calling fsnotify hook during fs shutdown.
> > > > >
> > > > > Say nothing before reading your fix.
> > > > > >
> > > > > > Also, fsnotify_sb_info(sb) in fsnotify_sb_has_priority_watchers()
> > > > > > is also not protected and using srcu_read_lock() there completely
> > > > > > nullifies the purpose of fsnotify_sb_info.
> > > > > >
> > > > > > Here is a simplified fix for fsnotify_sb_error() rebased on the
> > > > > > pending mm fixes for this syzbot boot failure:
> > > > > >
> > > > > > #syz test: https://github.com/amir73il/linux fsnotify-fixes
> > > > >
> > > > > Feel free to post your patch at lore because not everyone has
> > > > > access to sites like github.
> > > > > >
> > > > > > Jan,
> > > > > >
> > > > > > I think that all the functions called from fs shutdown context
> > > > > > should observe that SB_ACTIVE is cleared but wasn't sure?
> > > > >
> > > > > If you composed fix based on SB_ACTIVE that is cleared in
> > > > > generic_shutdown_super() with &sb->s_umount held for write,
> > > > > I wonder what simpler serialization than srcu you could
> > > > > find/create in fsnotify.
> > > >
> > > > As far as I can tell there is no need for serialisation.
> > > >
> > > > The problem is that fsnotify_sb_error() can be called from the
> > > > context of ->put_super() call from generic_shutdown_super().
> > > >
> > > > It's true that in the repro the thread calling fsnotify_sb_error()
> > > > in the worker thread running quota deferred work from put_super()
> > > > but I think there are sufficient barriers for this worker thread to
> > > > observer the cleared SB_ACTIVE flag.
> > > >
> > > > Anyway, according to syzbot, repro does not trigger the UAF
> > > > with my last fix.
> > > >
> > > > To be clear, any fsnotify_sb_error() that is a result of a user operation
> > > > would be holding an active reference to sb so cannot race with
> > > > fsnotify_sb_delete(), but I am not sure that same is true for ext4
> > > > worker threads.
> > > >
> > > > Jan,
> > > >
> > > > You wrote that "In theory these two calls can even run in parallel
> > > > and fsnotify() can be holding fsnotify_sb_info pointer while
> > > > fsnotify_sb_delete() is freeing".
> > > >
> > > > Can you give an example of this case?
> > >
> > > Yeah, basically what Hilf writes:
> > >
> > > Task 1 Task 2
> > > umount() some delayed work, transaction
> > > commit, whatever is still running
> > > before ext4_put_super() completes
> > > ... ext4_error()
> > > fsnotify_sb_error()
> > > fsnotify()
> > > fetches fsnotify_sb_info
> > > generic_shutdown_super()
> > > fsnotify_sb_delete()
> > > frees fsnotify_sb_info
> >
> > OK, so what do you say about Hillf's fix patch?
> >
> > Maybe it is ok to let go of the optimization in fsnotify(), considering
> > that we now have stronger optimizations in the inline hooks and
> > in __fsnotify_parent()?
> >
> > I think that Hillf's patch is missing setting s_fsnotify_info to NULL?
> >
> > @@ -101,8 +101,8 @@ void fsnotify_sb_delete(struct super_blo
> > wait_var_event(fsnotify_sb_watched_objects(sb),
> > !atomic_long_read(fsnotify_sb_watched_objects(sb)));
> > WARN_ON(fsnotify_sb_has_priority_watchers(sb, FSNOTIFY_PRIO_CONTENT));
> > + WRITE_ONCE(sb->s_fsnotify_info, NULL);
> > + synchronize_srcu(&fsnotify_mark_srcu);
> > kfree(sbinfo);
> > }
>
> So I had a look into this. Yes, something like this should work. We'll see
> whether synchronize_srcu() won't slow down umount too much. If someone will
> complain, we'll have to find a better solution.
>

Actually, kfree_rcu(sbinfo) may be enough.
We do not actually access sbinfo during mark iteration and
event handling, we only access it to get to the sb connector.

Something like the attached patch?

Thanks,
Amir.
From 8907b0e96a1d7d3b090982abc6d9eb396eb467f0 Mon Sep 17 00:00:00 2001
From: Amir Goldstein <amir73il@xxxxxxxxx>
Date: Thu, 11 Apr 2024 18:59:08 +0300
Subject: [PATCH v2] fsnotify: fix UAF from FS_ERROR event on a shutting down
filesystem

Protect against use after free when filesystem calls fsnotify_sb_error()
during fs shutdown.

fsnotify_sb_error() may be called without an s_active reference, so use
RCU to synchronize access to fsnotify_sb_info() in this case.

Reported-by: syzbot+5e3f9b2a67b45f16d4e6@xxxxxxxxxxxxxxxxxxxxxxxxx
Suggested-by: Hillf Danton <hdanton@xxxxxxxx>
Link: https://lore.kernel.org/linux-fsdevel/20240413014033.1722-1-hdanton@xxxxxxxx/
Fixes: 07a3b8d0bf72 ("fsnotify: lazy attach fsnotify_sb_info state to sb")
Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx>
---
fs/notify/fsnotify.c | 17 ++++++++++++++---
include/linux/fsnotify_backend.h | 4 ++++
2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 2ae965ef37e8..310c8e845482 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -103,7 +103,8 @@ void fsnotify_sb_delete(struct super_block *sb)
WARN_ON(fsnotify_sb_has_priority_watchers(sb, FSNOTIFY_PRIO_CONTENT));
WARN_ON(fsnotify_sb_has_priority_watchers(sb,
FSNOTIFY_PRIO_PRE_CONTENT));
- kfree(sbinfo);
+ WRITE_ONCE(sb->s_fsnotify_info, NULL);
+ kfree_rcu(sbinfo, rcu);
}

/*
@@ -506,6 +507,7 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir,
struct dentry *moved;
int inode2_type;
int ret = 0;
+ bool sb_active_ref = !(mask & FS_EVENTS_POSS_ON_SHUTDOWN);
__u32 test_mask, marks_mask;

if (path)
@@ -535,8 +537,10 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir,
* However, if we do not walk the lists, we do not have to do
* SRCU because we have no references to any objects and do not
* need SRCU to keep them "alive".
+ * We need RCU read side to keep sbinfo "alive" for events possible
+ * during shutdown (e.g. FS_ERROR).
*/
- if ((!sbinfo || !sbinfo->sb_marks) &&
+ if ((!sbinfo || (sb_active_ref && !sbinfo->sb_marks)) &&
(!mnt || !mnt->mnt_fsnotify_marks) &&
(!inode || !inode->i_fsnotify_marks) &&
(!inode2 || !inode2->i_fsnotify_marks))
@@ -562,11 +566,18 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir,
return 0;

iter_info.srcu_idx = srcu_read_lock(&fsnotify_mark_srcu);
-
+ /*
+ * For events possible during shutdown (e.g. FS_ERROR) we may not hold
+ * an s_active reference on sb, so we need to dereference sbinfo with
+ * rcu_read_lock() held.
+ */
+ rcu_read_lock();
+ sbinfo = fsnotify_sb_info(sb);
if (sbinfo) {
iter_info.marks[FSNOTIFY_ITER_TYPE_SB] =
fsnotify_first_mark(&sbinfo->sb_marks);
}
+ rcu_read_unlock();
if (mnt) {
iter_info.marks[FSNOTIFY_ITER_TYPE_VFSMOUNT] =
fsnotify_first_mark(&mnt->mnt_fsnotify_marks);
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 7f1ab8264e41..f11047125522 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -97,6 +97,9 @@
*/
#define FS_EVENTS_POSS_TO_PARENT (FS_EVENTS_POSS_ON_CHILD)

+/* Events that could be generated while fs is shutting down */
+#define FS_EVENTS_POSS_ON_SHUTDOWN (FS_ERROR)
+
/* Events that can be reported to backends */
#define ALL_FSNOTIFY_EVENTS (ALL_FSNOTIFY_DIRENT_EVENTS | \
FS_EVENTS_POSS_ON_CHILD | \
@@ -488,6 +491,7 @@ struct fsnotify_mark_connector {
*/
struct fsnotify_sb_info {
struct fsnotify_mark_connector __rcu *sb_marks;
+ struct rcu_head rcu;
/*
* Number of inode/mount/sb objects that are being watched in this sb.
* Note that inodes objects are currently double-accounted.
--
2.34.1