Re: [PATCH 4/4] selftests/resctrl: Adjust SNC support messages
From: Peter Newman
Date: Tue Mar 19 2024 - 17:01:22 EST
Hi Tony,
On Mon, Mar 18, 2024 at 3:05 PM Luck, Tony <tony.luck@xxxxxxxxx> wrote:
>
> > Could you please help me understand the details by answering my first
> > question: What is the use case for needing to expose the individual cluster
> > counts?
> >
> > This is a model specific feature so if this is something needed for just a
> > couple of systems I think we should be less inclined to make changes to
> > resctrl interface. I am starting to be concerned about something similar
> > becoming architectural later and then we need to wrangle this model specific
> > resctrl support (which has then become ABI) again to support whatever that
> > may look like.
>
> Reinette,
>
> Model specific. But present in multiple consecutive generations (Sapphire Rapids,
> Emerald Rapids, Granite Rapids, Sierra Forest).
>
> Adding Peter Newman for a resctrl user perspective on SNC, rather than me
> continue to speculate on possible ways this might be used.
>
> Peter: You will need to dig back a few messages on lore.kernel.org to
> get context.
Our main concern with supporting SNC in resctrl is all of the
monitoring groups successfully recording memory bandwidth from all
CPUs, regardless of the RMIDs they're assigned.
I would prefer that we don't complicate the model of resctrl
monitoring domains for this feature. On ARM SoCs there will be a
plethora of technologies influencing the layout of resources, so we
shouldn't start cluttering the model with special cases for each.
I think it's valid for the number of domains in the L3 resource to
increase or stay the same when the system is configured for SNC. I
don't think the details of how the domains came about is relevant at
the resctrl interface level so long as the user has enough information
to deduce what the domain is referring to based on knowledge of their
system configuration.
I would prefer per-cluster as more information could prove useful in
some future investigation, but if you feel the data is misleading,
providing the clusters combined is also fine. I would prefer that the
choice remains consistent from this point forward on any particular
implementation to avoid breaking existing controller software
developed for that implementation.
In our main use case, we sum mon_data/*/mbm_total_bytes to determine a
group's total bandwidth, so please don't cause this logic to produce
the wrong answer.
Thanks!
-Peter