Re: [PATCH] fs/resctrl: Fix MBM events being unconditionally enabled in mbm_event mode
From: Babu Moger
Date: Tue Oct 07 2025 - 13:36:23 EST
Hi Reinette,
On 10/6/25 20:23, Reinette Chatre wrote:
Hi Babu,
On 10/6/25 1:38 PM, Moger, Babu wrote:
Hi Reinette,
On 10/6/25 12:56, Reinette Chatre wrote:
Hi Babu,
On 9/30/25 1:26 PM, Babu Moger wrote:
resctrl features can be enabled or disabled using boot-time kernel
parameters. To turn off the memory bandwidth events (mbmtotal and
mbmlocal), users need to pass the following parameter to the kernel:
"rdt=!mbmtotal,!mbmlocal".
ah, indeed ... although, the intention behind the mbmtotal and mbmlocal kernel
parameters was to connect them to the actual hardware features identified
by X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL respectively.
Found that memory bandwidth events (mbmtotal and mbmlocal) cannot be
disabled when mbm_event mode is enabled. resctrl_mon_resource_init()
unconditionally enables these events without checking if the underlying
hardware supports them.
Technically this is correct since if hardware supports ABMC then the
hardware is no longer required to support X86_FEATURE_CQM_MBM_TOTAL and
X86_FEATURE_CQM_MBM_LOCAL in order to provide mbm_total_bytes
and mbm_local_bytes.
I can see how this may be confusing to user space though ...
Remove the unconditional enablement of MBM features in
resctrl_mon_resource_init() to fix the problem. The hardware support
verification is already done in get_rdt_mon_resources().
I believe by "hardware support" you mean hardware support for
X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL. Wouldn't a fix like
this then require any system that supports ABMC to also support
X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL to be able to
support mbm_total_bytes and mbm_local_bytes?
Yes. That is correct. Right now, ABMC and X86_FEATURE_CQM_MBM_TOTAL/
X86_FEATURE_CQM_MBM_LOCAL are kind of tightly coupled. We have not clearly
separated the that.
Are you speaking from resctrl side since from what I understand these are
independent features from the hardware side?
It is independent from hardware side. I meant we still use legacy events from "default" mode.
This problem seems to be similar to the one solved by [1] since
by supporting ABMC there is no "hardware does not support mbmtotal/mbmlocal"
but instead there only needs to be a check if the feature has been disabled
by command line. That is, add a rdt_is_feature_enabled() check to the
existing "!resctrl_is_mon_event_enabled()" check?
Enable or disable needs to be done at get_rdt_mon_resources(). It needs to
be done early in the initialization before calling domain_add_cpu() where
event data structures (mbm_states aarch_mbm_states) are allocated.
Good point. My mistake to suggest the event should be enabled by
resctrl fs.
How about adding another check in get_rdt_mon_resources()?
if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)
|| rdt_is_feature_enabled(mbmtotal)) {
resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID);
ret = true;
}
I need to take Tony's patch for this.
But wait ... I think there may be a bigger problem when considering systems
that support ABMC but not X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL.
Shouldn't resctrl prevent such a system from switching to "default"
mbm_assign_mode? Otherwise resctrl will happily let such a system switch
to default mode and when user attempts to read an event file resctrl will
attempt to read it via MSRs that are not supported.
Looks like ABMC may need something similar to CONFIG_RESCTRL_ASSIGN_FIXED
to handle this case in show() while preventing user space from switching to
"default" mode on write()?
This may not be an issue right now. When X86_FEATURE_CQM_MBM_TOTAL and
X86_FEATURE_CQM_MBM_LOCAL are not supported then mon_data files of these
events are not created.
By "right now" I assume you mean the current implementation? I think your statement
assumes that no CPUs come or go after resctrl_mon_resource_init() enables the MBM events?
Current implementation will enable MBM events if ABMC is supported. When the
first CPU of a domain comes online after that then resctrl will create the mon_data
files. These files will remain if a user then switches to default mode and if
the user then attempts to read one of these counters then I expect problems.
Yes. It will be a problem in the that case.
I am not clear on using config option you mentioned above.
What about using the check resctrl_is_mon_event_enabled() in
resctrl_mbm_assign_mode_show() and resctrl_mbm_assign_mode_write() ?
Thanks
Babu