Re: [PATCH v9 03/27] x86/resctrl: Check all domains are offline in resctrl_exit()

From: Reinette Chatre
Date: Thu May 01 2025 - 13:06:10 EST


Hi James,

On 4/25/25 10:37 AM, James Morse wrote:
> resctrl_exit() removes things like the resctrl mount point directory
> and unregisters the filesystem prior to freeing data structures that
> were allocated during resctrl_init().
>
> This assumes that there are no online domains when resctrl_exit() is
> called. If any domain were online, the limbo or overflow handler could
> be scheduled to run.
>
> Add a check for any online control or monitor domains, and document that
> the architecture code is required to do this.

nit: It may not be obvious at this point what "this" means. Above could be:

Add a check for any online control or monitor domains, and document that
the architecture code is required to offline all monitor and control
domains before calling resctrl_exit().

>
> Suggested-by: Reinette Chatre <reinette.chatre@xxxxxxxxx>
> Signed-off-by: James Morse <james.morse@xxxxxxx>
> ---
> Changes since v8:
> * This patch is new.

Thank you for adding this.

> ---
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 24 ++++++++++++++++++++++++
> 1 file changed, 24 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 88197afbbb8a..f617ac97758b 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -4420,8 +4420,32 @@ int __init resctrl_init(void)
> return ret;
> }
>
> +static bool __exit resctrl_online_domains_exist(void)
> +{
> + struct rdt_resource *r;
> +
> + for_each_rdt_resource(r) {
> + if (!list_empty(&r->ctrl_domains) || !list_empty(&r->mon_domains))

A list needs to be initialized for list_empty() to behave as intended. A list within
an uninitialized or "kzalloc()'ed" struct will not be considered empty.
resctrl_arch_get_resource() as used by for_each_rdt_resource() already establishes
that if an architecture does not support a particular resource then it can (should?)
return a "dummy/not-capable" resource. I do not think resctrl should require
anything additionally like initializing the lists of a dummy/not-capable resource
to support things like this loop.

Considering this, could this be made more specific? For example,

for_each_alloc_capable_rdt_resource(r) {
if (!list_empty(&r->ctrl_domains))
return true;
}

for_each_mon_capable_rdt_resource(r) {
if (!list_empty(&r->mon_domains))
return true;
}

> + return true;
> + }
> +
> + return false;
> +}
> +
> +/*
> + * resctrl_exit() - Remove the resctrl filesystem and free resources.
> + *
> + * When called by the architecture code, all CPUs and resctrl domains must be
> + * offline. This ensures the limbo and overflow handlers are not scheduled to
> + * run, meaning the data structures they access can be freed by
> + * resctrl_mon_resource_exit().
> + */
> void __exit resctrl_exit(void)
> {
> + cpus_read_lock();
> + WARN_ON_ONCE(resctrl_online_domains_exist());
> + cpus_read_unlock();
> +
> debugfs_remove_recursive(debugfs_resctrl);
> unregister_filesystem(&rdt_fs_type);
> sysfs_remove_mount_point(fs_kobj, "resctrl");

Thank you.

Reinette