Re: [PATCH v11 06/23] x86/resctrl: Add support to enable/disable AMD ABMC feature

From: Reinette Chatre
Date: Thu Feb 06 2025 - 13:43:33 EST


Hi Babu,

On 2/6/25 8:15 AM, Moger, Babu wrote:
> Hi Reinette,
>
> On 2/5/2025 4:49 PM, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 1/22/25 12:20 PM, Babu Moger wrote:
>>> Add the functionality to enable/disable AMD ABMC feature.
>>>
>>> AMD ABMC feature is enabled by setting enabled bit(0) in MSR
>>> L3_QOS_EXT_CFG. When the state of ABMC is changed, the MSR needs
>>> to be updated on all the logical processors in the QOS Domain.
>>>
>>> Hardware counters will reset when ABMC state is changed.
>>
>> I find that the state management in this series is organized better
>> and easier to understand. I do think that it can be simplified more
>> and a hint to this is that it is mentioned here but not done in the
>> code introduced here but instead required from the caller. It seems
>> simpler to me that the architectural state can just be reset at the
>> same time as enable/disable of ABMC?
>
> Right now, it is done from mbm_cntr_reset(). It does both arch and non-arch state reset for all the RMIDs in all the domains. It is called in two places.
>
> 1 rdtgroup.c resctrl_mbm_assign_mode_write -> mbm_cntr_reset();
Please see my response to this usage in the related patch:
https://lore.kernel.org/lkml/b60b4f72-6245-46db-a126-428fb13b6310@xxxxxxxxx/
In summary, I find mbm_cntr_reset() ended up being a catch-all for random
cleanup and creates confusion with the other mbm_cntr_*() calls.

> 2 rdtgroup.c rdt_kill_sb()-> mbm_cntr_reset();
Please see my response to this usage in the related patch:
https://lore.kernel.org/lkml/8d04f824-d1cc-461c-9c57-0f26c6aa96e0@xxxxxxxxx/
In summary, it does not solve the problem it originally set out to solve
and it can be eliminated.

>
> I will have to introduce another function to reset RMIDs in all the domains. Also, make sure it is called from both these places.
>
> list_for_each_entry(dom, &r->mon_domains, hdr.list)
>             resctrl_arch_reset_rmid_all(r, dom);

I do not see need for new functions, except the one I mention in
https://lore.kernel.org/lkml/b60b4f72-6245-46db-a126-428fb13b6310@xxxxxxxxx/
that suggests a new helper for reset of architectural state that does not
exist and ends up being open coded in two places in this series.

With only one place (resctrl_mbm_assign_mode_write()) remaining that needs
all state reset I think it will be easier to understand if the state reset
is open coded within it, replacing mbm_cntr_reset() with:

list_for_each_entry(dom, &r->mon_domains, hdr.list) {
mbm_cntr_free_all()
resctrl_reset_rmid_all() // Just for architectural state
}

I would not insist on reset of architectural state within the
architectural helper. I find that it is best for architecture to
maintain its state but I also see there are many precedent for
resctrl explicitly managing the state.

> I feel current code is much more cleaner.  What do you think?

It is better that previous versions, yes.

>
>>
>>>
>>> The ABMC feature details are documented in APM listed below [1].
>>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>>> Monitoring (ABMC).
>>>
>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>>> Signed-off-by: Babu Moger <babu.moger@xxxxxxx>
>>> ---
>>
>> ...
>>

...

>>> + */
>>> +static void _resctrl_abmc_enable(struct rdt_resource *r, bool enable)
>>> +{
>>> +    struct rdt_mon_domain *d;
>>> +
>>> +    list_for_each_entry(d, &r->mon_domains, hdr.list)
>>> +        on_each_cpu_mask(&d->hdr.cpu_mask,
>>> +                 resctrl_abmc_set_one_amd, &enable, 1);
>>> +}
>>> +
>>> +int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable)
>>> +{
>>> +    struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>>> +
>>> +    if (r->mon.mbm_cntr_assignable &&
>>> +        hw_res->mbm_cntr_assign_enabled != enable) {
>>> +        _resctrl_abmc_enable(r, enable);
>>> +        hw_res->mbm_cntr_assign_enabled = enable;
>>
>> Added benefit of resetting architectural state within this if statement
>> (perhaps simpler to be done within _resctrl_abmc_enable()) is that it will
>> not be done unnecessarily if ABMC is already in requested state.
>
> It will be
>       list_for_each_entry(dom, &r->mon_domains, hdr.list)
>             resctrl_arch_reset_rmid_all(r, dom);

I am not sure if you are actually planning a new loop here ... as
I suggested above this can be added to _resctrl_abmc_enable() where
there is already a loop over all monitor domains and all that is
needed is to add a call to resctrl_arch_reset_rmid_all(r, dom).
Even so, as I mentioned above, if after fixing automatic counter
unassignment you still find that resetting architectural and
non-architectural state together then we can go with that to match
the other flows (eg. mbm_config_write_domain()).

Reinette