Re: [PATCH] fs/resctrl: Fix deadlock for errors during mount

From: Reinette Chatre

Date: Fri May 01 2026 - 19:17:40 EST

Hi Tony,

On 5/1/26 11:56 AM, Tony Luck wrote:
> Sashiko noticed[1] a deadlock in the resctrl mount code.
>
> rdt_get_tree() acquires rdtgroup_mutex before calling kernfs_get_tree(). If
> superblock setup fails inside kernfs_get_tree(), the VFS calls kill_sb on
> the same thread before the call returns. rdt_kill_sb() unconditionally
> attempts to acquire rdtgroup_mutex and deadlock occurs.

Thank you for addressing this.

>
> Add a boolean rdt_kill_sb_locked flag. Set it for the duration of
> kernfs_get_tree() and check in rdt_kill_sb() to determine if locks
> are already held.
>

...

> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index 5dfdaa6f9d8f..8544020ef420 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -2782,6 +2782,9 @@ static void schemata_list_destroy(void)
> }
> }
>
> +/* Protected by the serialized mount path (rdtgroup_mutex + resctrl_mounted). */

I interpret above to mean that every access to rdt_kill_sb_locked can be expected to
be done with rdtgroup_mutex held ...

> +static bool rdt_kill_sb_locked;
> +
> static int rdt_get_tree(struct fs_context *fc)
> {
> struct rdt_fs_context *ctx = rdt_fc2context(fc);
> @@ -2855,7 +2858,9 @@ static int rdt_get_tree(struct fs_context *fc)
> if (ret)
> goto out_mondata;
>
> + rdt_kill_sb_locked = true;
> ret = kernfs_get_tree(fc);
> + rdt_kill_sb_locked = false;
> if (ret < 0)
> goto out_psl;
>
> @@ -3173,8 +3178,10 @@ static void rdt_kill_sb(struct super_block *sb)
> {
> struct rdt_resource *r;
>
> - cpus_read_lock();
> - mutex_lock(&rdtgroup_mutex);
> + if (!rdt_kill_sb_locked) {
> + cpus_read_lock();
> + mutex_lock(&rdtgroup_mutex);

... but here clearly rdt_kill_sb_locked can be accessed without rdtgroup_mutex held.

It appears that while this change claims that rdt_kill_sb_locked is protected the
implementation instead seems to actually be "this works for the scenarios cared
about here" which I understand to be based on considerations of how the filesystem
code interacts with resctrl callbacks _today_.

> + }
>
> rdt_disable_ctx();
>
> @@ -3189,8 +3196,10 @@ static void rdt_kill_sb(struct super_block *sb)
> resctrl_arch_disable_mon();
> resctrl_mounted = false;
> kernfs_kill_sb(sb);
> - mutex_unlock(&rdtgroup_mutex);
> - cpus_read_unlock();
> + if (!rdt_kill_sb_locked) {
> + mutex_unlock(&rdtgroup_mutex);
> + cpus_read_unlock();
> + }
> }
>
> static struct file_system_type rdt_fs_type = {

Did you or your AI assistant consider running kernfs_get_tree() without rdtgroup_mutex
and CPU hotplug lock held? Consider, for example:

diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 36d21652616e..9ee6295d6521 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -2892,10 +2892,6 @@ static int rdt_get_tree(struct fs_context *fc)
if (ret)
goto out_mondata;

- ret = kernfs_get_tree(fc);
- if (ret < 0)
- goto out_psl;
-
if (resctrl_arch_alloc_capable())
resctrl_arch_enable_alloc();
if (resctrl_arch_mon_capable())
@@ -2911,10 +2907,10 @@ static int rdt_get_tree(struct fs_context *fc)
RESCTRL_PICK_ANY_CPU);
}

- goto out;
+ mutex_unlock(&rdtgroup_mutex);
+ cpus_read_unlock();
+ return kernfs_get_tree(fc);

-out_psl:
- rdt_pseudo_lock_release();
out_mondata:
if (resctrl_arch_mon_capable())
kernfs_remove(kn_mondata);

This seems simpler by:
* avoiding introduction of additional state (rdt_kill_sb_locked) with unclear protection,
* avoiding double-cleanup on failure (rdt_kill_sb() called and then all rdt_get_tree()'s
failure path),
* maintaining symmetry with rdt_kill_sb() by providing it the state it is
expected to be called with (i.e resctrl_mounted = true).

>From what I can tell it is safe to call kernfs_kill_sb() on failure of kernfs_get_tree(),
but this needs to have been be considered as part of this submission anyway.

Oh, maybe there is a new lock ordering issue with this that I am missing?

Reinette