Re: [PATCH 00/11] x86,fs/resctrl: Improve resctrl quality and consistency
From: Ben Horgan
Date: Wed Mar 18 2026 - 08:00:38 EST
Hi Reinette,
On 3/17/26 18:09, Reinette Chatre wrote:
> Hi Ben,
>
> On 3/17/26 3:25 AM, Ben Horgan wrote:
>> On 3/16/26 18:18, Reinette Chatre wrote:
>>> On 3/16/26 10:44 AM, Ben Horgan wrote:
>>>> On 3/2/26 18:46, Reinette Chatre wrote:
>>> ...
>>>> One related issue I've just noticed is that when ABMC and mbm_assign_on_mkdir are
>>>> enabled the creation of MON/CTRL_MON directories may succeed but an error message
>>>> is written to last_cmd_status. E.g.
>>>>
>>>> /sys/fs/resctrl# mkdir mon_groups/new5
>>>> /sys/fs/resctrl# cat info/last_cmd_status
>>>> Failed to allocate counter for mbm_total_bytes in domain 2
>>>>
>>>> The failure is ignored, as expected, in rdt_assign_cntrs() but the last_cmd_status
>>>> is never cleared. I think this could be fixed by:
>>>>
>>>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>>>> index 62edb464410a..396f17ed72c6 100644
>>>> --- a/fs/resctrl/monitor.c
>>>> +++ b/fs/resctrl/monitor.c
>>>> @@ -1260,6 +1260,8 @@ void rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
>>>> if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
>>>> rdtgroup_assign_cntr_event(NULL, rdtgrp,
>>>> &mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID]);
>>>> +
>>>> + rdt_last_cmd_clear();
>>>> }
>>>>
>>>> Is this right thing to do? Let me know if you want a proper patch.
>>>
>>> Letting group be created without any counters assigned while writing the error
>>> to last_cmd_status is the intended behavior. If the last_cmd_status buffer is cleared
>>> at this point then user space will never have the opportunity to see the message that
>>> contains the details.
>>>
>>> It did not seem appropriate to let resource group creation fail when no counters
>>> are available. I see that the documentation is not clear on this. What do you think
>>> of an update to documentation instead? Would something like below help clarify behavior?
>>>
>>> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
>>> index ba609f8d4de5..20dc58d281cf 100644
>>> --- a/Documentation/filesystems/resctrl.rst
>>> +++ b/Documentation/filesystems/resctrl.rst
>>> @@ -478,6 +478,12 @@ with the following files:
>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_on_mkdir
>>> 0
>>>
>>> + Automatic counter assignment is done with best effort. If auto assignment
>>> + is enabled but there are not enough available counters then monitor group
>>> + creation could succeed while one or more events belonging to the group may
>>> + not have a counter assigned. Consult last_cmd_status for details during
>>> + such scenario.
>>> +
>>
>> This does the improve the situation but the multiple domain failure behaviour depends
>> on the order the domains are iterated through. This is stable as the list is sorted but
>> does seem a bit complicated.
>> I.e. if you have two domains, with ids 2 and 3, with no counters remaining on domain 2 but
>> some on domain 3 then rdtgroup_assign_cntr_event() will fail early and the counter won't
>> be assigned for domain 3 but the last_cmd_status will only be about domain 2. The user
>> either needs to know a failure at one domain means all higher domains will not be
>> considered for that counter or look at the new mbm_L3_assignments to understand what's happened.
>> In this case we have:
>>
>> /sys/fs/resctrl# cat info/L3_MON/mbm_assign_on_mkdir
>> 1
>> /sys/fs/resctrl# cat info/L3_MON/available_mbm_cntrs
>> 2=0;3=1
>> /sys/fs/resctrl# mkdir mon_groups/new
>> /sys/fs/resctrl# cat info/last_cmd_status
>> Failed to allocate counter for mbm_total_bytes in domain 2
>> /sys/fs/resctrl# cat mon_groups/new/mbm_L3_assignments
>> mbm_total_bytes:2=_;3=_
>
> Good point.
>
>>
>> Would it be better for each domain to be considered even if a previous failure occurred or
>> is this now a fixed behaviour? For illustration:
>
> I do not believe this needs to be fixed.
>
>>
>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>> index 62edb464410a..8e061bce9742 100644
>> --- a/fs/resctrl/monitor.c
>> +++ b/fs/resctrl/monitor.c
>> @@ -1248,18 +1248,25 @@ static int rdtgroup_assign_cntr_event(struct rdt_l3_mon_domain *d, struct rdtgro
>> void rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
>> {
>> struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
>> + struct rdt_l3_mon_domain *d;
>>
>> if (!r->mon_capable || !resctrl_arch_mbm_cntr_assign_enabled(r) ||
>> !r->mon.mbm_assign_on_mkdir)
>> return;
>>
>> - if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
>> - rdtgroup_assign_cntr_event(NULL, rdtgrp,
>> - &mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID]);
>> + if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID)) {
>> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
>> + rdtgroup_assign_cntr_event(d, rdtgrp,
>> + &mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID]);
>> + }
>> + }
>>
>> - if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
>> - rdtgroup_assign_cntr_event(NULL, rdtgrp,
>> - &mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID]);
>> + if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID)) {
>> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
>> + rdtgroup_assign_cntr_event(d, rdtgrp,
>> + &mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID]);
>> + }
>> + }
>> }
>
> With a solution like this last_cmd_status could potentially contain multiple lines, one
> line per domain that failed. last_cmd_status is 512 bytes so if this is a system with
> many domains there is a risk of overflow and user space not seeing all failures.
> That may be ok?
Probably but maybe we can do better. I wonder that given that we know we are trying to allocate
counters in all domains the information in last_cmd_status could summarize which
domains failed. 512 bytes isn't that large so a hex encoded bit map or something else information
dense would be needed. This does make last_cmd_status a bit less human readable so may not
be the way to go.
>
> I think this can be simplified within rdt_assign_cntr_event() though. Consider:
>
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index 49f3f6b846b2..a6a791a15e30 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -1209,12 +1209,13 @@ static int rdtgroup_alloc_assign_cntr(struct rdt_resource *r, struct rdt_l3_mon_
> * NULL; otherwise, assign the counter to the specified domain @d.
> *
> * If all counters in a domain are already in use, rdtgroup_alloc_assign_cntr()
> - * will fail. The assignment process will abort at the first failure encountered
> - * during domain traversal, which may result in the event being only partially
> - * assigned.
> + * will fail. Ignore errors when attempting to assign a counter to all domains
> + * since only some domains may have counters available and goal is to assign
> + * counters where possible. Only caller providing @d of NULL is
> + * rdtgroup_assign_cntrs() that ignores errors.
> *
> * Return:
> - * 0 on success, < 0 on failure.
> + * 0 on success when @d is specified, 0 always when @d is NULL, < 0 on failure.
> */
> static int rdtgroup_assign_cntr_event(struct rdt_l3_mon_domain *d, struct rdtgroup *rdtgrp,
> struct mon_evt *mevt)
> @@ -1223,11 +1224,8 @@ static int rdtgroup_assign_cntr_event(struct rdt_l3_mon_domain *d, struct rdtgro
> int ret = 0;
>
> if (!d) {
> - list_for_each_entry(d, &r->mon_domains, hdr.list) {
> - ret = rdtgroup_alloc_assign_cntr(r, d, rdtgrp, mevt);
> - if (ret)
> - return ret;
> - }
> + list_for_each_entry(d, &r->mon_domains, hdr.list)
> + rdtgroup_alloc_assign_cntr(r, d, rdtgrp, mevt);
> } else {
> ret = rdtgroup_alloc_assign_cntr(r, d, rdtgrp, mevt);
> }
That does the job but is it simpler to delete rdtgroup_assign_cntr_event()
and call rdtgroup_alloc_assign_cntr() directly from rdtgroup_assign_cntrs()
and rdtgroup_modify_assign_state() rather than special casing it further?
Thanks,
Ben
>
>
> Reinette
>