Re: [PATCH v8 08/24] x86/resctrl: Track the number of dirty RMID a CLOSID has

From: James Morse
Date: Mon Jan 22 2024 - 13:32:24 EST


Hi Peter,

On 04/01/2024 19:13, Peter Newman wrote:
> On Fri, Dec 15, 2023 at 9:44 AM James Morse <james.morse@xxxxxxx> wrote:
>> void free_rmid(u32 closid, u32 rmid)
>> @@ -792,13 +813,33 @@ void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms)
>> static int dom_data_init(struct rdt_resource *r)
>> {
>> u32 idx_limit = resctrl_arch_system_num_rmid_idx();
>> + u32 num_closid = resctrl_arch_get_num_closid(r);

> Which resource is this again? Surely the one with the smallest number
> of CLOSIDs?

Today it's implicitly L3 because that is the only one resctrl supports monitoring on


> It's not much harm if the array is bigger than it needs to be, but

Heh, this use of this variable is behind those IS_ENABLED(), which means it gets removed
unless you are on an MPAM system. MPAM always has to sanitise these fields as not all the
hardware is exposed to resctrl.
(e.g. L3 and MB might support 16 CLOSID, but if there is an invisible system-cache in
between them that only supports 8 CLOSID, the system-wide value has to be 8, regardless of
what the hardware supports.)

The MPAM driver finds the system wide value here:
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/tree/drivers/platform/mpam/mpam_devices.c?h=mpam/snapshot/v6.7-rc2#n757

And regardless of which resource you select, returns that value here:
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/tree/drivers/platform/mpam/mpam_resctrl.c?h=mpam/snapshot/v6.7-rc2#n128

On x86 the helper returns the hardware num CLOSID so that the resctrl sanitisation does
the right thing.

I'll add a comment that this may over-allocate if the architecture isn't pre-sanitising
this field:
| /*
| * If the architecture hasn't provided a sanitised value here,
| * this may result in larger arrays than necessary. Resctrl will
| * use a smaller system wide value based on the resources in
| * use.
| */


> I've become curious about how The Monitoring Resource is used in the
> code when there are later changes[1] which would cause this function
> to be called on RDT_RESOURCE_L3, RDT_RESOURCE_MBA, or both.

I need to digest Tony's series. Today the event names all have L3 in them - the MPAM
driver is ignoring both this and the resources, and relying on heuristics to pick
something to back these counters with. Something is better than nothing,.
I agree it can be improved as resctrl allows more things to be exposed.


> Given that we have hardware with event counters residing at different
> levels of the topology and possibly being associated with different
> rdt_resources, more attention needs to be paid to how these parameters
> are used in code related to monitoring.

Certainly there are likely to be weirdness in what the MPAM driver picks here. Those
patches are marked untested for a reason! I have nothing I can test the bandwidth counters on.

My intention here is that 'things that look like a Xeon' should behave equivalently as far
as resctrl can see. That gets any existing software working. Beyond that we can talk about
extending what we have to better cover the hardware people have built.

I'm coming to the conclusion that results vary depending on {ingress,egress} of {L3, SLC,
Memory-Side-Cache, Memory-Controller} - even when only one is implemented, and that hiding
this in resctrl isn't helpful. Using perf's platform-specific json files to identify
counters may be a better approach.


Thanks,

James