Re: [PATCH v18 15/17] x86/resctrl: Fix RMID reading sanity check for Sub-NUMA (SNC) mode

From: Reinette Chatre
Date: Thu May 23 2024 - 13:03:53 EST

Next message: Jonathan Cameron: "Re: [PATCH] dt-bindings: iio: adc: add a7779 doc"
Previous message: SeongJae Park: "Re: [PATCH 6.1 00/45] 6.1.92-rc1 review"
In reply to: Tony Luck: "Re: [PATCH v18 15/17] x86/resctrl: Fix RMID reading sanity check for Sub-NUMA (SNC) mode"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Tony,

On 5/22/24 4:47 PM, Tony Luck wrote:

On Wed, May 22, 2024 at 02:25:23PM -0700, Reinette Chatre wrote:

+ /*
+ * SNC: OK to read events on any CPU sharing same L3
+ * cache instance.
+ */
+ if (d->display_id != get_cpu_cacheinfo_id(smp_processor_id(),
+ r->mon_display_scope))

By hardcoding that mon_display_scope is a cache instead of using get_domain_id_from_scope()
it seems that all pretending about being generic has just been abandoned at this point.

Yes. It now seems like a futile quest to make this look
like something generic. All this code is operating on the

I did not see the generic solution as not being possible. The implementation seemed generally ok to me when thinking of it as a generic solution with implementation that has the optimization of not sending IPIs unnecessarily since the only user is SNC.

rdt_resources_all[RDT_RESOURCE_L3] resource (which by its very name is

Yes, good point.

"L3" scoped). In the SNC case the L3 has been divided (in some senses,
but not all) into nodes.

Given that pretending isn't working ... just be explicit?

Some "thinking aloud" follows ...

Sure, will consider with you ...

struct rdt_resource:
In order to track monitor events, resctrl must build a domain list based
on the smallest measurement scope. So with SNC enabled, that is the
node. With it disabled it is L3 cache scope (which on existing systems
is the same as node scope).

Maybe keep .mon_scope with the existing name, but define it to be the
minimum measurement scope and use it to build domains. So it
defaults to RESCTRL_L3_CACHE but SNC detection will rewrite it to
RESCTRL_L3_NODE.

Above has been agreed on for a while now, no? The only change is that the name of the new scope will change from RESCTRL_NODE to RESCTRL_L3_NODE?

Drop the .mon_display_scope field. By definition it must always have
the value RESCTRL_L3_CACHE. So replace checks that compare values
rdt_resources_all[RDT_RESOURCE_L3] of .mon_scope & .mon_display_scope
with:

if (r->mon_scope != RESCTRL_L3_CACHE)
// SNC stuff
else
// regular stuff

This seems reasonable considering what you reminded about earlier that
all things monitoring is hardcoded to RDT_RESOURCE_L3. Perhaps that test
can be a macro with an elaborate comment describing the SNC view of the world? I also think that a specific test may be easier to understand
("if (r->mon_scope == RESCTRL_L3_NODE) /* SNC */") since that makes it easier to follow code to figure out where RESCTRL_L3_NODE is assigned as opposed to trying to find flows where mon_scope is _not_ RESCTRL_L3_CACHE.

struct rdt_mon_domain:
In the rdt_mon_domain rename the display_id field with the more
honest name "l3_cache_id". In addition save a pointer to the
.shared_cpu_map of the L3 cache. When SNC is off, this will be the

Sounds good. If already saving a pointer, could that be simplified, while also making code easier to understand, with a pointer to the cache's struct cacheinfo instead? That will give access to cache ID as well as shared_cpu_map.

same as the d->hdr.cpu_mask for the domain. For SNC on it will be
a superset (encompassing all the bits from cpu_masks in all domains
that share an L3 instance).

May need to take care when considering scenarios where CPUs can be offlined. For example, when SNC is enabled and all CPUs associated with all but one NUMA domain are disabled then the final remaining monitoring domain may have the same CPU mask as the L3 cache even though SNC is enabled?

Where SNC specifc code is required, the check becomes:

if (d->hdr.id != d->l3_cache_id)
// SNC stuff
else
// regular stuff

I am not sure about these tests and will need more context on where they will be used. For example, when SNC is enabled then NUMA node #0 belongs to cache ID #0 then the test would not capture that SNC is enabled for
monitoring domain #0?

The l3_cache_id can be used in mkdir code to make the mon_L3_XX
directories. The L3 .shared_cpu_map in picking a CPU to read
the counters for the "sum" files. l3_cache_id also indicates
which domains should be summed.

Using the L3 .shared_cpu_map to pick CPU sounds good. It really makes
it obvious what is going on.

Does this look like a useful direction to pursue?

As I understand it will make the code obviously specific to SNC but not change the flow of implementation in this series. I do continue to believe that many of the flows to support SNC are not intuitive (to me) so I would like to keep my request that the SNC portions have clear comments to explain why it does the things it does and not just leave the reader with impression of "if (SNC specific check) /* quirks */ ".
This will help future changes to these areas.

Reinette

Next message: Jonathan Cameron: "Re: [PATCH] dt-bindings: iio: adc: add a7779 doc"
Previous message: SeongJae Park: "Re: [PATCH 6.1 00/45] 6.1.92-rc1 review"
In reply to: Tony Luck: "Re: [PATCH v18 15/17] x86/resctrl: Fix RMID reading sanity check for Sub-NUMA (SNC) mode"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]