Re: [PATCH v9 20/26] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled
From: Moger, Babu
Date: Wed Dec 04 2024 - 12:55:43 EST
Hi Fenghua,
On 12/4/24 11:14, Moger, Babu wrote:
> Hi Fenghua,
>
> Thanks for the review.
>
> On 12/3/24 22:16, Fenghua Yu wrote:
>> Hi, Babu,
>>
>> On 10/29/24 16:21, Babu Moger wrote:
>>> Assign/unassign counters on resctrl group creation/deletion. Two counters
>>> are required per group, one for MBM total event and one for MBM local
>>> event.
>>>
>>> There are a limited number of counters available for assignment. If these
>>> counters are exhausted, the kernel will display the error message: "Out of
>>> MBM assignable counters". However, it is not necessary to fail the
>>> creation of a group due to assignment failures. Users have the flexibility
>>> to modify the assignments at a later time.
>>>
>>> Signed-off-by: Babu Moger <babu.moger@xxxxxxx>
>>> ---
>>> v9: Changed rdtgroup_assign_cntrs() and rdtgroup_unassign_cntrs() to
>>> return void.
>>> Updated couple of rdtgroup_unassign_cntrs() calls properly.
>>> Updated function comments.
>>>
>>> v8: Renamed rdtgroup_assign_grp to rdtgroup_assign_cntrs.
>>> Renamed rdtgroup_unassign_grp to rdtgroup_unassign_cntrs.
>>> Fixed the problem with unassigning the child MON groups of CTRL_MON
>>> group.
>>>
>>> v7: Reworded the commit message.
>>> Removed the reference of ABMC with mbm_cntr_assign.
>>> Renamed the function rdtgroup_assign_cntrs to rdtgroup_assign_grp.
>>>
>>> v6: Removed the redundant comments on all the calls of
>>> rdtgroup_assign_cntrs. Updated the commit message.
>>> Dropped printing error message on every call of rdtgroup_assign_cntrs.
>>>
>>> v5: Removed the code to enable/disable ABMC during the mount.
>>> That will be another patch.
>>> Added arch callers to get the arch specific data.
>>> Renamed fuctions to match the other abmc function.
>>> Added code comments for assignment failures.
>>>
>>> v4: Few name changes based on the upstream discussion.
>>> Commit message update.
>>>
>>> v3: This is a new patch. Patch addresses the upstream comment to enable
>>> ABMC feature by default if the feature is available.
>>> ---
>>> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 61 +++++++++++++++++++++++++-
>>> 1 file changed, 60 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> index b0cce3dfd062..a8d21b0b2054 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> @@ -2932,6 +2932,46 @@ static void schemata_list_destroy(void)
>>> }
>>> }
>>> +/*
>>> + * Called when a new group is created. If "mbm_cntr_assign" mode is
>>> enabled,
>>> + * counters are automatically assigned. Each group can accommodate two
>>> counters:
>>> + * one for the total event and one for the local event. Assignments may
>>> fail
>>> + * due to the limited number of counters. However, it is not necessary
>>> to fail
>>> + * the group creation and thus no failure is returned. Users have the
>>> option
>>> + * to modify the counter assignments after the group has been created.
>>> + */
>>> +static void rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
>>> +{
>>> + struct rdt_resource *r =
>>> &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>>> +
>>> + if (!resctrl_arch_mbm_cntr_assign_enabled(r))
>>> + return;
>>> +
>>> + if (is_mbm_total_enabled())
>>> + rdtgroup_assign_cntr_event(r, rdtgrp, NULL,
>>> QOS_L3_MBM_TOTAL_EVENT_ID);
>>
>> In this code path,
>> resctrl_mkdir()->resctrl_mkdir_ctrl_mon()->rdtgroup_assign_cntrs()->rdtgroup_assign_cntr_event()
>>
>> CPUs are not protected by read lock while rdtgroup_assign_cntr_event()
>> walks r->mon_domains and run assing counters code on CPUs in the domains.
>> Without CPU protection, r->mon_domains may race with CPU hotplug.
>>
>> In another patch (i.e. rdt_get_tree()), rdtgroup_assign_cntrs() is
>> protected by cpus_read_lock()/unlock().
>>
>> So maybe define two helpers:
>>
>> // Called when caller takes cpus_read_lock()
>> rdtgroup_assign_cntrs_locked()
>> {
>> lockdep_assert_cpus_held();
>>
>> then the current rdtgroup_assign_cntrs() code
>> }
>>
>> // Called when caller doesn't take cpus_read_lock()
>> rdtgroup_assign_cntrs()
>> {
>> cpus_read_lock();
>> rdtgroup_assign_cntrs_locked();
>> cpus_read_unlock();
>> }
>>
>
> Good observation. Agree. There is a problem.
> Some of this code will change with earlier comments.
>
> We know couple of paths are affected here. Why not just add the lock
> before calling in affected paths instead of adding new helpers?
>
> /*
> * Walking r->domains in rdtgroup_assign_cntrs, ensure it can't race
> * with cpuhp
> */
> cpus_read_lock();
> rdtgroup_assign_cntrs()
> cpus_read_unlock();
>
Oh no. Looks like we are good here.
Looks at Reinette's response. Thanks Reinette.
https://lore.kernel.org/lkml/4032a5a5-dd0a-49ae-94b6-dc4fac4c190d@xxxxxxxxx/
>
>
>>> +
>>> + if (is_mbm_local_enabled())
>>> + rdtgroup_assign_cntr_event(r, rdtgrp, NULL,
>>> QOS_L3_MBM_LOCAL_EVENT_ID);
>>> +}
>>> +
>>> +/*
>>> + * Called when a group is deleted. Counters are unassigned if it was in
>>> + * assigned state.
>>> + */
>>> +static void rdtgroup_unassign_cntrs(struct rdtgroup *rdtgrp)
>>> +{
>>> + struct rdt_resource *r =
>>> &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>>> +
>>> + if (!resctrl_arch_mbm_cntr_assign_enabled(r))
>>> + return;
>>> +
>>> + if (is_mbm_total_enabled())
>>> + rdtgroup_unassign_cntr_event(r, rdtgrp, NULL,
>>> QOS_L3_MBM_TOTAL_EVENT_ID);
>>> +
>>> + if (is_mbm_local_enabled())
>>> + rdtgroup_unassign_cntr_event(r, rdtgrp, NULL,
>>> QOS_L3_MBM_LOCAL_EVENT_ID);
>>> +}
>>> +
>>
>> Seems rdtgroup_unassign_cntrs() is always protected by
>> cpus_read_lock()/unlock(). So it's good.
>
> ok
>
>>
>>> static int rdt_get_tree(struct fs_context *fc)
>>> {
>>> struct rdt_fs_context *ctx = rdt_fc2context(fc);
>>> @@ -2991,6 +3031,8 @@ static int rdt_get_tree(struct fs_context *fc)
>>> if (ret < 0)
>>> goto out_mongrp;
>>> rdtgroup_default.mon.mon_data_kn = kn_mondata;
>>> +
>>> + rdtgroup_assign_cntrs(&rdtgroup_default);
>>
>> In this case, cpus_read_lock() was called earlier. Change to
>> rdtgroup_assign_cntrs_locked().
>>
>>> }
>>> ret = rdt_pseudo_lock_init();
>>> @@ -3021,8 +3063,10 @@ static int rdt_get_tree(struct fs_context *fc)
>>> out_psl:
>>> rdt_pseudo_lock_release();
>>> out_mondata:
>>> - if (resctrl_arch_mon_capable())
>>> + if (resctrl_arch_mon_capable()) {
>>> + rdtgroup_unassign_cntrs(&rdtgroup_default);
>>> kernfs_remove(kn_mondata);
>>> + }
>>> out_mongrp:
>>> if (resctrl_arch_mon_capable())
>>> kernfs_remove(kn_mongrp);
>>> @@ -3201,6 +3245,7 @@ static void free_all_child_rdtgrp(struct rdtgroup
>>> *rdtgrp)
>>> head = &rdtgrp->mon.crdtgrp_list;
>>> list_for_each_entry_safe(sentry, stmp, head, mon.crdtgrp_list) {
>>> + rdtgroup_unassign_cntrs(sentry);
>>> free_rmid(sentry->closid, sentry->mon.rmid);
>>> list_del(&sentry->mon.crdtgrp_list);
>>> @@ -3241,6 +3286,8 @@ static void rmdir_all_sub(void)
>>> cpumask_or(&rdtgroup_default.cpu_mask,
>>> &rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask);
>>> + rdtgroup_unassign_cntrs(rdtgrp);
>>> +
>>> free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
>>> kernfs_remove(rdtgrp->kn);
>>> @@ -3272,6 +3319,7 @@ static void rdt_kill_sb(struct super_block *sb)
>>> for_each_alloc_capable_rdt_resource(r)
>>> reset_all_ctrls(r);
>>> rmdir_all_sub();
>>> + rdtgroup_unassign_cntrs(&rdtgroup_default);
>>> rdt_pseudo_lock_release();
>>> rdtgroup_default.mode = RDT_MODE_SHAREABLE;
>>> schemata_list_destroy();
>>> @@ -3280,6 +3328,7 @@ static void rdt_kill_sb(struct super_block *sb)
>>> resctrl_arch_disable_alloc();
>>> if (resctrl_arch_mon_capable())
>>> resctrl_arch_disable_mon();
>>> +
>>
>> Unnecessary change.
>
> ok.
>
>>
>>> resctrl_mounted = false;
>>> kernfs_kill_sb(sb);
>>> mutex_unlock(&rdtgroup_mutex);
>>> @@ -3871,6 +3920,8 @@ static int rdtgroup_mkdir_mon(struct kernfs_node
>>> *parent_kn,
>>> goto out_unlock;
>>> }
>>> + rdtgroup_assign_cntrs(rdtgrp);
>>> + > kernfs_activate(rdtgrp->kn);
>>> /*
>>> @@ -3915,6 +3966,8 @@ static int rdtgroup_mkdir_ctrl_mon(struct
>>> kernfs_node *parent_kn,
>>> if (ret)
>>> goto out_closid_free;
>>> + rdtgroup_assign_cntrs(rdtgrp);
>>> +
>>> kernfs_activate(rdtgrp->kn);
>>> ret = rdtgroup_init_alloc(rdtgrp);
>>> @@ -3940,6 +3993,7 @@ static int rdtgroup_mkdir_ctrl_mon(struct
>>> kernfs_node *parent_kn,
>>> out_del_list:
>>> list_del(&rdtgrp->rdtgroup_list);
>>> out_rmid_free:
>>> + rdtgroup_unassign_cntrs(rdtgrp);
>>> mkdir_rdt_prepare_rmid_free(rdtgrp);
>>> out_closid_free:
>>> closid_free(closid);
>>> @@ -4010,6 +4064,9 @@ static int rdtgroup_rmdir_mon(struct rdtgroup
>>> *rdtgrp, cpumask_var_t tmpmask)
>>> update_closid_rmid(tmpmask, NULL);
>>> rdtgrp->flags = RDT_DELETED;
>>> +
>>> + rdtgroup_unassign_cntrs(rdtgrp);
>>> +
>>> free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
>>> /*
>>> @@ -4056,6 +4113,8 @@ static int rdtgroup_rmdir_ctrl(struct rdtgroup
>>> *rdtgrp, cpumask_var_t tmpmask)
>>> cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask);
>>> update_closid_rmid(tmpmask, NULL);
>>> + rdtgroup_unassign_cntrs(rdtgrp);
>>> +
>>> free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
>>> closid_free(rdtgrp->closid);
>>>
>>
>> Thanks.
>>
>> -Fenghua
>>
>
--
Thanks
Babu Moger