Re: [PATCH] fsnotify: fix inode reference leak in fsnotify_recalc_mask()

From: 尹欣

Date: Mon Apr 20 2026 - 22:04:45 EST



> From: "Amir Goldstein"<amir73il@xxxxxxxxx>
> Date:  Mon, Apr 20, 2026, 21:34
> Subject:  Re: [PATCH] fsnotify: fix inode reference leak in fsnotify_recalc_mask()
> To: "Xin Yin"<yinxin.x@xxxxxxxxxxxxx>
> Cc: <jan@xxxxxxx>, <linux-fsdevel@xxxxxxxxxxxxxxx>, <linux-kernel@xxxxxxxxxxxxxxx>
> On Mon, Apr 20, 2026 at 8:52 AM Xin Yin <yinxin.x@xxxxxxxxxxxxx> wrote:
> >
> > fsnotify_recalc_mask() fails to handle the return value of
> > __fsnotify_recalc_mask(), which may return an inode pointer that needs
> > to be released via fsnotify_drop_object() when the connector's HAS_IREF
> > flag transitions from set to cleared.
> >
> > This manifests as a hung task with the following call trace:
> >
> >   INFO: task umount:1234 blocked for more than 120 seconds.
> >   Call Trace:
> >    __schedule
> >    schedule
> >    fsnotify_sb_delete
> >    generic_shutdown_super
> >    kill_anon_super
> >    cleanup_mnt
> >    task_work_run
> >    do_exit
> >    do_group_exit
> >
> > The race window that triggers the iref leak:
> >
> >   Thread A (adding mark)              Thread B (removing mark)
> >   ──────────────────────              ────────────────────────
> >   fsnotify_add_mark_locked():
> >     fsnotify_add_mark_list():
> >       spin_lock(conn->lock)
> >       add mark_B(evictable) to list
> >       spin_unlock(conn->lock)
> >     return
> >
> >     /* ---- gap: no lock held ---- */
> >
> >                                       fsnotify_detach_mark(mark_A):
> >                                         spin_lock(mark_A->lock)
> >                                         clear ATTACHED flag on mark_A
> >                                         spin_unlock(mark_A->lock)
> >
> >     fsnotify_recalc_mask():
> >       spin_lock(conn->lock)
> >       __fsnotify_recalc_mask():
> >         /* mark_A skipped: ATTACHED cleared */
> >         /* only mark_B(evictable) remains */
> >         want_iref = false
> >         has_iref = true  /* not yet cleared */
> >         -> HAS_IREF transitions true -> false
> >         -> returns inode pointer
> >       spin_unlock(conn->lock)
> >       /* BUG: return value discarded!
> >        * iput() and fsnotify_put_sb_watched_objects()
> >        * are never called */
> >
> 
> Hi Xin!
> 
> Ouch! nasty one.
> Very good analysis.
> Do you also have a reproducer?

Yes, I have a reproducer attached. 
In my testing it reproduces reliably - typically within the first round.
./repro_nomod 200 16 3

Thanks,
Xin.
> 
> 
> > Fix this by capturing the return value of __fsnotify_recalc_mask() and
> > passing it to fsnotify_drop_object() after releasing the spinlock, which
> > is the same pattern used in fsnotify_put_mark().
> 
> The fix looks correct, but I prefer for my mental model to keep
> fsnotify_drop_object() in one place at the end of fsnotify_put_mark().
> 
> Please see the suggested (only sanity tested) patch.
> 
> Thanks,
> Amir.
> 
> 
> >
> > Fixes: c3638b5b1374 ("fsnotify: allow adding an inode mark without pinning inode")
> > Signed-off-by: Xin Yin <yinxin.x@xxxxxxxxxxxxx>
> > ---
> >  fs/notify/mark.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/notify/mark.c b/fs/notify/mark.c
> > index c2ed5b11b0fe6..cc93fcc2c5a9c 100644
> > --- a/fs/notify/mark.c
> > +++ b/fs/notify/mark.c
> > @@ -283,6 +283,8 @@ static void fsnotify_conn_set_children_dentry_flags(
> >         fsnotify_set_children_dentry_flags(fsnotify_conn_inode(conn));
> >  }
> >
> > +static void fsnotify_drop_object(unsigned int type, void *objp);
> > +
> >  /*
> >   * Calculate mask of events for a list of marks. The caller must make sure
> >   * connector and connector->obj cannot disappear under us.  Callers achieve
> > @@ -292,15 +294,19 @@ static void fsnotify_conn_set_children_dentry_flags(
> >  void fsnotify_recalc_mask(struct fsnotify_mark_connector *conn)
> >  {
> >         bool update_children;
> > +       unsigned int type;
> > +       void *objp;
> >
> >         if (!conn)
> >                 return;
> >
> >         spin_lock(&conn->lock);
> >         update_children = !fsnotify_conn_watches_children(conn);
> > -       __fsnotify_recalc_mask(conn);
> > +       objp = __fsnotify_recalc_mask(conn);
> > +       type = conn->type;
> >         update_children &= fsnotify_conn_watches_children(conn);
> >         spin_unlock(&conn->lock);
> > +       fsnotify_drop_object(type, objp);
> >         /*
> >          * Set children's PARENT_WATCHED flags only if parent started watching.
> >          * When parent stops watching, we clear false positive PARENT_WATCHED
> > --
> > 2.20.1
> 

Attachment: repro_nomod.c
Description: Binary data