Re: [PATCH] fsnotify: fix inode reference leak in fsnotify_recalc_mask()

From: 尹欣

Date: Mon Apr 20 2026 - 03:24:09 EST


Cc  Jan
Sorry, I used the wrong email address.

Thanks,
Xin Yin

> From: "Xin Yin"<yinxin.x@xxxxxxxxxxxxx>
> Date:  Mon, Apr 20, 2026, 14:51
> Subject:  [PATCH] fsnotify: fix inode reference leak in fsnotify_recalc_mask()
> To: <jan@xxxxxxx>, <amir73il@xxxxxxxxx>
> Cc: <linux-fsdevel@xxxxxxxxxxxxxxx>, <linux-kernel@xxxxxxxxxxxxxxx>, "Xin Yin"<yinxin.x@xxxxxxxxxxxxx>
> fsnotify_recalc_mask() fails to handle the return value of
> __fsnotify_recalc_mask(), which may return an inode pointer that needs
> to be released via fsnotify_drop_object() when the connector's HAS_IREF
> flag transitions from set to cleared.
> 
> This manifests as a hung task with the following call trace:
> 
>   INFO: task umount:1234 blocked for more than 120 seconds.
>   Call Trace:
>    __schedule
>    schedule
>    fsnotify_sb_delete
>    generic_shutdown_super
>    kill_anon_super
>    cleanup_mnt
>    task_work_run
>    do_exit
>    do_group_exit
> 
> The race window that triggers the iref leak:
> 
>   Thread A (adding mark)              Thread B (removing mark)
>   ──────────────────────              ────────────────────────
>   fsnotify_add_mark_locked():
>     fsnotify_add_mark_list():
>       spin_lock(conn->lock)
>       add mark_B(evictable) to list
>       spin_unlock(conn->lock)
>     return
> 
>     /* ---- gap: no lock held ---- */
> 
>                                       fsnotify_detach_mark(mark_A):
>                                         spin_lock(mark_A->lock)
>                                         clear ATTACHED flag on mark_A
>                                         spin_unlock(mark_A->lock)
> 
>     fsnotify_recalc_mask():
>       spin_lock(conn->lock)
>       __fsnotify_recalc_mask():
>         /* mark_A skipped: ATTACHED cleared */
>         /* only mark_B(evictable) remains */
>         want_iref = false
>         has_iref = true  /* not yet cleared */
>         -> HAS_IREF transitions true -> false
>         -> returns inode pointer
>       spin_unlock(conn->lock)
>       /* BUG: return value discarded!
>        * iput() and fsnotify_put_sb_watched_objects()
>        * are never called */
> 
> Fix this by capturing the return value of __fsnotify_recalc_mask() and
> passing it to fsnotify_drop_object() after releasing the spinlock, which
> is the same pattern used in fsnotify_put_mark().
> 
> Fixes: c3638b5b1374 ("fsnotify: allow adding an inode mark without pinning inode")
> Signed-off-by: Xin Yin <yinxin.x@xxxxxxxxxxxxx>
> ---
>  fs/notify/mark.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/notify/mark.c b/fs/notify/mark.c
> index c2ed5b11b0fe6..cc93fcc2c5a9c 100644
> --- a/fs/notify/mark.c
> +++ b/fs/notify/mark.c
> @@ -283,6 +283,8 @@ static void fsnotify_conn_set_children_dentry_flags(
>          fsnotify_set_children_dentry_flags(fsnotify_conn_inode(conn));
>  }
>  
> +static void fsnotify_drop_object(unsigned int type, void *objp);
> +
>  /*
>   * Calculate mask of events for a list of marks. The caller must make sure
>   * connector and connector->obj cannot disappear under us.  Callers achieve
> @@ -292,15 +294,19 @@ static void fsnotify_conn_set_children_dentry_flags(
>  void fsnotify_recalc_mask(struct fsnotify_mark_connector *conn)
>  {
>          bool update_children;
> +        unsigned int type;
> +        void *objp;
>  
>          if (!conn)
>                  return;
>  
>          spin_lock(&conn->lock);
>          update_children = !fsnotify_conn_watches_children(conn);
> -        __fsnotify_recalc_mask(conn);
> +        objp = __fsnotify_recalc_mask(conn);
> +        type = conn->type;
>          update_children &= fsnotify_conn_watches_children(conn);
>          spin_unlock(&conn->lock);
> +        fsnotify_drop_object(type, objp);
>          /*
>           * Set children's PARENT_WATCHED flags only if parent started watching.
>           * When parent stops watching, we clear false positive PARENT_WATCHED
> -- 
> 2.20.1
>