Re: [PATCH v7 05/14] x86/resctrl: Stop setting event_group::force_off on RMID shortage
From: Reinette Chatre
Date: Tue Jun 09 2026 - 19:02:47 EST
Hi Tony,
On 6/9/26 9:51 AM, Luck, Tony wrote:
> On Mon, Jun 08, 2026 at 04:16:58PM -0700, Reinette Chatre wrote:
>> Hi Tony,
>>
>> On 6/1/26 12:56 PM, Tony Luck wrote:
>>> Drop the force_off assignment from all_regions_have_sufficient_rmid().
>>
>> Apart from this being obvious from the patch, this is not how an x86 changelog
>> should be structured. Please see "Changelog" in
>> Documentation/process/maintainer-tip.rst
>
> Revised version:
> ---
> Subject: x86/resctrl: Fix usage of event_group::force_off in AET
>
> The kernel command line option "rdt=" is used to force enable or disable
> individual resctrl features.
>
> There are two problems with usage in the AET (Application Energy
> Telemetry) feature:
Above seems incomplete. Could it be "two problems with usage of the rdt= kernel
command line ..." or "two problems with the "rdt=" usage ..." or ...?
> 1) If the user specifies both enabling and disabling of the same feature
> (e.g. "rdt=energy,!energy") the request to enable should override the
> request to disable (for consistency with other features).
> 2) event_group::force_off is set true in all_regions_have_sufficient_rmid()
> which will cause problems when AET features are enumerated on each mount
> of the resctrl file system
Two comments:
- In general we should aim to make the changelog as specific as possible to
prevent reader needing to do any deciphering. This makes the review easier
and faster. So, please avoid general things like "will cause problems" that
requires reader to figure out the problems (plural!) on their own. Instead
just be specific about what the problems are.
- Is (2) still a problem when (1) is fixed? An underlying question is: if an
event group is successfully enumerated during resctrl mount, will it always
enumerate with identical properties on every subsequent mount?
>
> Fix the first issue by checking event_group::force_on in enable_events().
"the first issue" seems vague to me while the rest describes the code that can be
seen from the patch. How about something like below to help describe what the code
change accomplishes:
Enumerate the event group if user space overrides any disable of the event group.
>
> Fix the other issue by dropping the assignment to event_group::force_off.
"the other issue" is very vague. Here it will also help to be specific. Although
per earlier comment the need for this fix is no longer clear to me?
>
> ---
>
>>
>> Please note that, after the preparatory fixes that are expected to land separately,
>> this will be the first patch of this series ... having this first patch just
>> kick off with "what changes" without any context is a difficult way for a
>> reviewer to start considering this piece of work.
>>
>>> This preserves current single-enumeration behaviour while preparing for
>>> the upcoming per-mount enumeration, where latching force_off would
>>> incorrectly suppress re-enumeration on subsequent mounts - even when the
>>> user explicitly requested the feature via "rdt={feature}".
>>
>> I believe that user space should still expect that rdt= options behave
>> consistently. So if user space for some reason provides: rdt=!mbm,mbm,!energy,energy
>> on a system that supports both then it should not be the case that one is enabled and the
>> other not.
>>
>> I this think that, for example, enable_events() should start with:
>>
>> if (e->force_off && !e->force_on)
>> return false;
>
> OK.
>
>>
>>> Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx>
>>> ---
>>> arch/x86/kernel/cpu/resctrl/intel_aet.c | 8 +++-----
>>> 1 file changed, 3 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
>>> index 89b8b619d5d5..e2af700bca04 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
>>> @@ -60,8 +60,8 @@ struct pmt_event {
>>> * data for all telemetry regions of type @pfname.
>>> * Valid if the system supports the event group,
>>> * NULL otherwise.
>>> - * @force_off: True when "rdt" command line or architecture code disables
>>> - * this event group due to insufficient RMIDs.
>>> + * @force_off: True when "rdt" command line disables this event group
>>> + * to avoid system limitations due to insufficient RMIDs.
>>
>> >From what I understand it is only architecture that can disable an event group
>> "to avoid system limitations due to insufficient RMIDs" and this capability is removed
>> in this patch. Above implies that this capability now needs to be implemented
>> by userspace, which I do not think is accurate?
>
> Should I just drop the second line? The user may have various other reasons
> why they want to disable this event group.
Now that enable_events() starts with a check of event_group::force_on it seems that
the existing behavior of event_group::force_off to also reflect architecture
disable can be maintained?
> Combined with the code change you suggest above, this would describe the
> use of these two fields. "force_off" disables, "force_on" enables (and
> overrides any "force_off").
I believe the "force_on" description below already accurately describes how it
overrides an earlier disable.
>
>>
>>> * @force_on: True when "rdt" command line overrides disable of this
>>> * event group.
>>> * @guid: Unique number per XML description file.
>>> @@ -214,10 +214,8 @@ static bool all_regions_have_sufficient_rmid(struct event_group *e, struct pmt_f
>>> if (!p->regions[i].addr)
>>> continue;
>>> tr = &p->regions[i];
>>> - if (tr->num_rmids < e->num_rmid) {
>>> - e->force_off = true;
>>> + if (tr->num_rmids < e->num_rmid)
>>> return false;
>>> - }
>>> }
>>>
>>> return true;
Reinette