Re: [PATCH v8 04/21] x86/resctrl: resctrl_exit() teardown resctrl but leave the mount point

From: Reinette Chatre
Date: Tue Apr 15 2025 - 20:26:02 EST

Next message: Maxim Levitsky: "[PATCH 0/3] KVM: x86: allow DEBUGCTL.DEBUGCTLMSR_FREEZE_IN_SMM passthrough"
Previous message: Gustavo A. R. Silva: "Re: [PATCH v2][next] watchdog: cros-ec: Avoid -Wflex-array-member-not-at-end warning"
In reply to: James Morse: "[PATCH v8 04/21] x86/resctrl: resctrl_exit() teardown resctrl but leave the mount point"
Next in thread: James Morse: "[PATCH v8 07/21] x86/resctrl: Add end-marker to the resctrl_event_id enum"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi James,

On 4/11/25 9:42 AM, James Morse wrote:
> resctrl_exit() was intended for use when the 'resctrl' module was unloaded.
> resctrl can't be built as a module, and the kernfs helpers are not exported
> so this is unlikely to change. MPAM has an error interrupt which indicates
> the MPAM driver has gone haywire. Should this occur tasks could run with
> the wrong control values, leading to bad performance for important tasks.
> In this scenario the MPAM driver will reset the hardware, but it needs
> a way to tell resctrl that no further configuration should be attempted.
>
> In particular, moving tasks between control or monitor groups does not
> interact with the architecture code, so there is no opportunity for the
> arch code to indicate that the hardware is no-longer functioning.
>
> Using resctrl_exit() for this leaves the system in a funny state as
> resctrl is still mounted, but cannot be un-mounted because the sysfs
> directory that is typically used has been removed. Dave Martin suggests
> this may cause systemd trouble in the future as not all filesystems
> can be unmounted.
>
> Add calls to remove all the files and directories in resctrl, and
> remove the sysfs_remove_mount_point() call that leaves the system
> in a funny state. When triggered, this causes all the resctrl files
> to disappear. resctrl can be unmounted, but not mounted again.
>

The caveat here is that resctrl pretends to be mounted (resctrl_mounted == true)
but there is nothing there. The undocumented part of this is that for this
to work resctrl fs depends (a lot) on the architecture's callbacks to know
if they are being called after a resctrl_exit() call so that they return data
that will direct resctrl fs behavior to safest exit for those
resctrl fs flows that are still possible after a resctrl_exit(). Not ideal
layering.

I understand from a previous comment [1] that one of the Arm "tricks" is to
offline all domains. This seems to be a good "catch all" to ensure that at least
current flows of concern are not running anymore. Considering this,
what if there is a new resctrl_error_exit() that does something like below?

void resctrl_error_exit(void)
{
mutex_lock(&rdtgroup_mutex);
WARN_ON_ONCE(resctrl_new_function_returns_true_if_any_resource_has_a_control_or_monitor_domain());
resctrl_fs_teardown();
mutex_unlock(&rdtgroup_mutex);
resctrl_exit();
}

I do not see this as requiring anything new from architecture but instead
making what Arm already does a requirement and keeping existing behavior?

This leaves proc_resctrl_show() that relies on resctrl_mounted but as I see
the resctrl_fs_cleanup() will remove all resource groups that should result
in the output being as it will be if resctrl is not mounted. No dependence
on architecture callbacks returning resctrl_exit() aware data here.

> Signed-off-by: James Morse <james.morse@xxxxxxx>
> Tested-by: Carl Worth <carl@xxxxxxxxxxxxxxxxxxxxxx> # arm64
> Tested-by: Shaopeng Tan <tan.shaopeng@xxxxxxxxxxxxxx>
> Tested-by: Peter Newman <peternewman@xxxxxxxxxx>
> Tested-by: Amit Singh Tomar <amitsinght@xxxxxxxxxxx> # arm64
> Tested-by: Shanker Donthineni <sdonthineni@xxxxxxxxxx> # arm64
> Tested-by: Babu Moger <babu.moger@xxxxxxx>
> Reviewed-by: Shaopeng Tan <tan.shaopeng@xxxxxxxxxxxxxx>
> Reviewed-by: Tony Luck <tony.luck@xxxxxxxxx>
> ---

...

> ---
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 42 +++++++++++++++++++++-----
> 1 file changed, 35 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index fdf2616c7ca0..3f9c37637d7e 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -3074,6 +3074,22 @@ static void rmdir_all_sub(void)
> kernfs_remove(kn_mondata);
> }
>
> +static void resctrl_fs_teardown(void)
> +{
> + lockdep_assert_held(&rdtgroup_mutex);
> +
> + /* Cleared by rdtgroup_destroy_root() */
> + if (!rdtgroup_default.kn)
> + return;
> +
> + rmdir_all_sub();
> + rdt_pseudo_lock_release();
> + rdtgroup_default.mode = RDT_MODE_SHAREABLE;
> + closid_exit();
> + schemata_list_destroy();
> + rdtgroup_destroy_root();
> +}
> +
> static void rdt_kill_sb(struct super_block *sb)
> {
> struct rdt_resource *r;
> @@ -3087,12 +3103,7 @@ static void rdt_kill_sb(struct super_block *sb)
> for_each_alloc_capable_rdt_resource(r)
> resctrl_arch_reset_all_ctrls(r);
>
> - rmdir_all_sub();
> - rdt_pseudo_lock_release();
> - rdtgroup_default.mode = RDT_MODE_SHAREABLE;
> - closid_exit();
> - schemata_list_destroy();
> - rdtgroup_destroy_root();
> + resctrl_fs_teardown();
> if (resctrl_arch_alloc_capable())
> resctrl_arch_disable_alloc();
> if (resctrl_arch_mon_capable())
> @@ -4123,6 +4134,8 @@ static int rdtgroup_setup_root(struct rdt_fs_context *ctx)
>
> static void rdtgroup_destroy_root(void)
> {
> + lockdep_assert_held(&rdtgroup_mutex);
> +
> kernfs_destroy_root(rdt_root);
> rdtgroup_default.kn = NULL;
> }
> @@ -4416,11 +4429,26 @@ int __init resctrl_init(void)
> return ret;
> }
>
> +/**
> + * resctrl_exit() - Remove the resctrl filesystem and free resources.
> + *
> + * Called by the architecture code in response to a fatal error.
> + * Resctrl files and structures are removed from kernfs to prevent further
> + * configuration.

Please write with imperative tone. For example, "Remove resctrl files and structures ..."

> + */
> void __exit resctrl_exit(void)
> {
> + mutex_lock(&rdtgroup_mutex);
> + resctrl_fs_teardown();
> + mutex_unlock(&rdtgroup_mutex);
> +
> debugfs_remove_recursive(debugfs_resctrl);

Is it possible for the fatal error handling to trigger multiple calls here?
To protect against multiple calls causing issues debugfs_resctrl can be set to NULL here.

> unregister_filesystem(&rdt_fs_type);

unregister_filesystem() seems to handle an already-unregistered filesystem.

> - sysfs_remove_mount_point(fs_kobj, "resctrl");
> +
> + /*
> + * The sysfs mount point added by resctrl_init() is not removed so that
> + * it can be used to umount resctrl.
> + */

(needs imperative)

>
> resctrl_mon_resource_exit();
> }

Reinette

[1] https://lore.kernel.org/lkml/5ab63ad9-f7f5-4d3d-b0cb-fd229aa269db@xxxxxxx/

Next message: Maxim Levitsky: "[PATCH 0/3] KVM: x86: allow DEBUGCTL.DEBUGCTLMSR_FREEZE_IN_SMM passthrough"
Previous message: Gustavo A. R. Silva: "Re: [PATCH v2][next] watchdog: cros-ec: Avoid -Wflex-array-member-not-at-end warning"
In reply to: James Morse: "[PATCH v8 04/21] x86/resctrl: resctrl_exit() teardown resctrl but leave the mount point"
Next in thread: James Morse: "[PATCH v8 07/21] x86/resctrl: Add end-marker to the resctrl_event_id enum"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]