Re: [PATCH v4] x86/resctrl: Add mount option to pick total MBM event

From: Tony Luck
Date: Fri Dec 01 2023 - 15:45:34 EST


On Wed, Nov 29, 2023 at 03:48:37PM -0800, Reinette Chatre wrote:
> Hi Tony,
>
> On 11/28/2023 3:14 PM, Tony Luck wrote:
> > Add a "total" mount option to be used in conjunction with "mba_MBps"
> > to request use of the total memory bandwidth event as the feedback
> > input to the control loop.
>
> "total" is very generic. It is also not clear to me why users
> would need to use two mount options. What if the new mount option
> is "mba_MBps_total" instead, without user needing to also provide
> "mba_MBps"?

I wasn't attached to "total". I tried to change the type of the existing
"mba_MBps" option to fsparam_string_emtpy() in the hope that existing users
would be able to keep using "mba_MBps", and users who want to use total
bandwidth could use "mba_MBps=total". But that type requires the "="
in the string provided by the user for the "empty" case.

So now I'm experimenting with adding a new option:

fsparam_string("mba_MBps_event", Opt_mba_mbps_event)

so a user would specify "mba_MBps_event=total" instead of "mba_MBps".
My code also allow for "mba_MBps_event=local" if a user want to ensure
they use the local bandwidth event (failing the mount if it is not
available).

Existing code using the legacy "mba_MBps" option will get local by
default, failing over to total if needed.

> >
> > Also fall back to using the total event if the local event is not
> > supported by the CPU.
> >
> > Update the once-per-second polling code to use the event (local
> > or total memory bandwidth).
>
> Please take care to describe why this change is needed, not just
> what it does. This is required by x86. For confirmation:
> https://lore.kernel.org/lkml/20231009172517.GRZSQ3fT05LGgpcW35@fat_crate.local/

Yes. I had some justification in earlier versions, but lost it along the
way. I will include in next version.

> >
> > Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx>
> > ---
> >
> > Changes since v3:
> >
> > Reinette suggested that users might like the option to use the total
> > memory bandwidth event. I tried out some code to make the event runtime
> > selectable via a r/w file in the resctrl/info directories. But that
> > got complicated because of the amount of state that needs to be updated
> > when switching events. Since there isn't a firm use case for user
> > selectable event, this latest version falls back to the far simpler
> > case of using a mount option.
>
> (I did not realize that that discussion was over.)

I'd like to avoid too much feature creep in this series. I'd like to
finish solving the original problem (systems without local bandwidth
monitoring should have a way to use total bandwidth) before tackling
additional issues in a separate patch series. Taking on a simple way
for users to request total bandwidth isn't much extra code. Making
it possible to switch events at runtime isn't. Fixing the corner cases
where the feedback loop may get stuck is also a separate issue.

> >
> > Documentation/arch/x86/resctrl.rst | 3 +++
> > arch/x86/kernel/cpu/resctrl/internal.h | 3 +++
> > arch/x86/kernel/cpu/resctrl/monitor.c | 20 +++++++++-----------
> > arch/x86/kernel/cpu/resctrl/rdtgroup.c | 15 ++++++++++++++-
> > 4 files changed, 29 insertions(+), 12 deletions(-)
> >
> > diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> > index a6279df64a9d..29c3e7137eb8 100644
> > --- a/Documentation/arch/x86/resctrl.rst
> > +++ b/Documentation/arch/x86/resctrl.rst
> > @@ -46,6 +46,9 @@ mount options are:
> > "mba_MBps":
> > Enable the MBA Software Controller(mba_sc) to specify MBA
> > bandwidth in MBps
> > +"total":
> > + Use total instead of local memory bandwidth to drive the
> > + MBA Software Controller
> > "debug":
> > Make debug files accessible. Available debug files are annotated with
> > "Available only with debug option".
> > diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> > index a4f1aa15f0a2..f98fc9adc2da 100644
> > --- a/arch/x86/kernel/cpu/resctrl/internal.h
> > +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> > @@ -59,6 +59,7 @@ struct rdt_fs_context {
> > bool enable_cdpl2;
> > bool enable_cdpl3;
> > bool enable_mba_mbps;
> > + bool use_mbm_total;
> > bool enable_debug;
> > };
>
> Why did you choose new member to not follow existing custom of having
> an enable_ prefix?

That does look awful. Next version will do this:

- bool enable_mba_mbps;
+ bool enable_mba_mbps_local;
+ bool enable_mba_mbps_total;

>
> >
> > @@ -428,6 +429,8 @@ extern struct rdt_hw_resource rdt_resources_all[];
> > extern struct rdtgroup rdtgroup_default;
> > DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
> >
> > +extern enum resctrl_event_id mba_mbps_evt_id;
> > +
>
> This global seems unnecessary. struct resctrl_membw.mba_sc indicates if
> the software controller is enabled. Creating this global fragments
> related information.

Global is now gone.

> One option could be to change the type of struct resctrl_membw.mba_sc to
> enum resctrl_event_id. I assume that 0 would never be a valid event ID and
> can thus be used to know if the software controller is disabled. If this
> is done then enum resctrl_event_id's documentation should be updated
> with this assumption/usage.

But I'm not fond of this overloading the meaning of resctrl_membw.mba_sc
by making that assumption that "0" will never be a valid event.

I've left the mba_sc field as a boolean that indicates enabled and added
a new field for the event:

@@ -138,6 +139,7 @@ struct resctrl_membw {
bool arch_needs_linear;
enum membw_throttle_mode throttle_mode;
bool mba_sc;
+ enum resctrl_event_id mba_mbps_event;
u32 *mb_map;
};

New version will be posted soon.

-Tony