Re: [PATCH v10 19/24] x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign mode

From: Reinette Chatre
Date: Thu Dec 19 2024 - 19:00:03 EST


Hi Babu,

On 12/12/24 12:15 PM, Babu Moger wrote:
> In mbm_cntr_assign mode, the hardware counter should be assigned to read
> the MBM events.
>
> Report 'Unassigned' in case the user attempts to read the events without
> assigning the counter.
>
> Signed-off-by: Babu Moger <babu.moger@xxxxxxx>
> ---

..

> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index c075fcee96b7..3ec14c314606 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -430,6 +430,16 @@ When monitoring is enabled all MON groups will also contain:
> for the L3 cache they occupy). These are named "mon_sub_L3_YY"
> where "YY" is the node number.
>
> + When supported the mbm_cntr_assign mode allows users to assign a

"When supported" -> "When enabled"? Or perhaps just drop that and start with
"mbm_cntr_assign mode allows users ..."


> + counter to mon_hw_id, event pair enabling bandwidth monitoring for
> + as long as the counter remains assigned. The hardware will continue
> + tracking the assigned mon_hw_id until the user manually unassigns
> + it, ensuring that counters are not reset during this period. With
> + a limited number of counters, the system may run out of assignable
> + counters. In that case, MBM event counters will return 'Unassigned'
> + when the event is read. Users must manually assign a counter to read
> + the events.
> +
> "mon_hw_id":
> Available only with debug option. The identifier used by hardware
> for the monitor group. On x86 this is the RMID.
> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> index 200d89a64027..8e265a86e524 100644
> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> @@ -527,6 +527,12 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
> /* When picking a CPU from cpu_mask, ensure it can't race with cpuhp */
> lockdep_assert_cpus_held();
>
> + if (resctrl_arch_mbm_cntr_assign_enabled(r) && is_mbm_event(evtid) &&
> + !mbm_cntr_assigned(r, d, rdtgrp, evtid)) {
> + rr->err = -ENOENT;
> + return;
> + }
> +

hmmm ... d can be NULL here after the SNC support. Since the file that needs a
sum is essentially software backed I do not think assigning counters would
apply to it (but it may theoretically apply to the domains it consists of).
I think it may be safer to just move this check into rdtgroup_mondata_show()
where it reads data for a single domain.

I am not sure if we need to change the documentation because of this. One option
could be a rewording to "MBM event counters may return 'Unassigned' or
'Unavailable' when the event is read".

Reinette