Re: [PATCH v11 17/23] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled
From: Reinette Chatre
Date: Mon Feb 10 2025 - 13:41:11 EST
Hi Babu,
On 2/10/25 9:27 AM, Moger, Babu wrote:
> On 2/6/25 12:03, Reinette Chatre wrote:
>> On 1/22/25 12:20 PM, Babu Moger wrote:
>>
>>> + * of hardware counter is not considered as an overflow in the
>>> + * next update.
>>> + */
>>> + if (is_mbm_enabled() && r->mon.mbm_cntr_assignable) {
>>> + list_for_each_entry(dom, &r->mon_domains, hdr.list) {
>>> + memset(dom->cntr_cfg, 0,
>>> + sizeof(*dom->cntr_cfg) * r->mon.num_mbm_cntrs);
>>> + if (is_mbm_total_enabled())
>>> + memset(dom->mbm_total, 0,
>>> + sizeof(struct mbm_state) * idx_limit);
>>> + if (is_mbm_local_enabled())
>>> + memset(dom->mbm_local, 0,
>>> + sizeof(struct mbm_state) * idx_limit);
>>> + resctrl_arch_reset_rmid_all(r, dom);
>>> + }
>>> + }
>>> +}
>>
>> I looked back at the previous versions to better understand how this function
>> came about and I do not think it actually solves the problem it aims to solve.
>>
>> rdtgroup_unassign_cntrs() can fail and when it does the counter is not free'd. That
>> leaves a monitoring domain's array with an entry that points to a resource group
>> that no longer exists (unless it is the default resource group) since
>> rdtgroup_unassign_cntrs() does not check the return and proceeds to remove the
>> resource group. mbm_cntr_reset() is called on umount of resctrl but
>> rdtgroup_unassign_cntrs() is called on every group remove and those scenarios
>> are not handled.
>>
>> To address this I believe that I need to go back on a previous request to have
>> resctrl_arch_config_cntr() return an error code. AMD does not need this and
>> it is difficult to predict what will work for MPAM. I originally wanted to be
>> flexible here but this appears to be impractical. With a new requirement that
>> resctrl_arch_config_cntr() always succeeds the counter will in turn always
>> be free'd and not leave dangling pointers. I believe doing so eliminates
>> the need for mbm_cntr_reset() as used in this patch. My apologies for the
>> misdirection. We can re-evaluate these flows if MPAM needs anything different.
>
> So, new requirement is to free the counter even if the
> resctrl_arch_config_cntr() call fails. That way after calling
No. Quoting above: "new requirement that resctrl_arch_config_cntr() always succeeds".
As I see it this will eliminate a lot of error checking on the calling path,
not ignore errors.
Reinette