Re: [PATCH] fsnotify: fix inode reference leak in fsnotify_recalc_mask()
From: 尹欣
Date: Mon Apr 20 2026 - 22:04:45 EST
> From: "Amir Goldstein"<amir73il@xxxxxxxxx>
> Date: Mon, Apr 20, 2026, 21:34
> Subject: Re: [PATCH] fsnotify: fix inode reference leak in fsnotify_recalc_mask()
> To: "Xin Yin"<yinxin.x@xxxxxxxxxxxxx>
> Cc: <jan@xxxxxxx>, <linux-fsdevel@xxxxxxxxxxxxxxx>, <linux-kernel@xxxxxxxxxxxxxxx>
> On Mon, Apr 20, 2026 at 8:52 AM Xin Yin <yinxin.x@xxxxxxxxxxxxx> wrote:
> >
> > fsnotify_recalc_mask() fails to handle the return value of
> > __fsnotify_recalc_mask(), which may return an inode pointer that needs
> > to be released via fsnotify_drop_object() when the connector's HAS_IREF
> > flag transitions from set to cleared.
> >
> > This manifests as a hung task with the following call trace:
> >
> > INFO: task umount:1234 blocked for more than 120 seconds.
> > Call Trace:
> > __schedule
> > schedule
> > fsnotify_sb_delete
> > generic_shutdown_super
> > kill_anon_super
> > cleanup_mnt
> > task_work_run
> > do_exit
> > do_group_exit
> >
> > The race window that triggers the iref leak:
> >
> > Thread A (adding mark) Thread B (removing mark)
> > ────────────────────── ────────────────────────
> > fsnotify_add_mark_locked():
> > fsnotify_add_mark_list():
> > spin_lock(conn->lock)
> > add mark_B(evictable) to list
> > spin_unlock(conn->lock)
> > return
> >
> > /* ---- gap: no lock held ---- */
> >
> > fsnotify_detach_mark(mark_A):
> > spin_lock(mark_A->lock)
> > clear ATTACHED flag on mark_A
> > spin_unlock(mark_A->lock)
> >
> > fsnotify_recalc_mask():
> > spin_lock(conn->lock)
> > __fsnotify_recalc_mask():
> > /* mark_A skipped: ATTACHED cleared */
> > /* only mark_B(evictable) remains */
> > want_iref = false
> > has_iref = true /* not yet cleared */
> > -> HAS_IREF transitions true -> false
> > -> returns inode pointer
> > spin_unlock(conn->lock)
> > /* BUG: return value discarded!
> > * iput() and fsnotify_put_sb_watched_objects()
> > * are never called */
> >
>
> Hi Xin!
>
> Ouch! nasty one.
> Very good analysis.
> Do you also have a reproducer?
Yes, I have a reproducer attached.
In my testing it reproduces reliably - typically within the first round.
./repro_nomod 200 16 3
Thanks,
Xin.
>
>
> > Fix this by capturing the return value of __fsnotify_recalc_mask() and
> > passing it to fsnotify_drop_object() after releasing the spinlock, which
> > is the same pattern used in fsnotify_put_mark().
>
> The fix looks correct, but I prefer for my mental model to keep
> fsnotify_drop_object() in one place at the end of fsnotify_put_mark().
>
> Please see the suggested (only sanity tested) patch.
>
> Thanks,
> Amir.
>
>
> >
> > Fixes: c3638b5b1374 ("fsnotify: allow adding an inode mark without pinning inode")
> > Signed-off-by: Xin Yin <yinxin.x@xxxxxxxxxxxxx>
> > ---
> > fs/notify/mark.c | 8 +++++++-
> > 1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/notify/mark.c b/fs/notify/mark.c
> > index c2ed5b11b0fe6..cc93fcc2c5a9c 100644
> > --- a/fs/notify/mark.c
> > +++ b/fs/notify/mark.c
> > @@ -283,6 +283,8 @@ static void fsnotify_conn_set_children_dentry_flags(
> > fsnotify_set_children_dentry_flags(fsnotify_conn_inode(conn));
> > }
> >
> > +static void fsnotify_drop_object(unsigned int type, void *objp);
> > +
> > /*
> > * Calculate mask of events for a list of marks. The caller must make sure
> > * connector and connector->obj cannot disappear under us. Callers achieve
> > @@ -292,15 +294,19 @@ static void fsnotify_conn_set_children_dentry_flags(
> > void fsnotify_recalc_mask(struct fsnotify_mark_connector *conn)
> > {
> > bool update_children;
> > + unsigned int type;
> > + void *objp;
> >
> > if (!conn)
> > return;
> >
> > spin_lock(&conn->lock);
> > update_children = !fsnotify_conn_watches_children(conn);
> > - __fsnotify_recalc_mask(conn);
> > + objp = __fsnotify_recalc_mask(conn);
> > + type = conn->type;
> > update_children &= fsnotify_conn_watches_children(conn);
> > spin_unlock(&conn->lock);
> > + fsnotify_drop_object(type, objp);
> > /*
> > * Set children's PARENT_WATCHED flags only if parent started watching.
> > * When parent stops watching, we clear false positive PARENT_WATCHED
> > --
> > 2.20.1
>
Attachment:
repro_nomod.c
Description: Binary data