Re: [PATCH v9 08/13] x86/resctrl: Add sysfs interface to read mbm_total_bytes_config

From: Reinette Chatre
Date: Thu Dec 15 2022 - 12:41:23 EST


Hi Babu,

I would like to second James's suggestion to replace sysfs with resctrl
or just remove it. I am concerned that you mentioned in recent message
that you only plan changes to patch 10 while James highlighted that this
needs to be addressed in entire series. Could you please ensure that
you check all the patches?

On 12/1/2022 7:36 AM, Babu Moger wrote:
> The current event configuration can be viewed by the user by reading

What "current" means is not clear and the term could just be removed.

> the configuration file /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config.
> The event configuration settings are domain specific and will affect all
> the CPUs in the domain.
>
> Following are the types of events supported:
> ==== ===========================================================
> Bits Description
> ==== ===========================================================
> 6 Dirty Victims from the QOS domain to all types of memory
> 5 Reads to slow memory in the non-local NUMA domain
> 4 Reads to slow memory in the local NUMA domain
> 3 Non-temporal writes to non-local NUMA domain
> 2 Non-temporal writes to local NUMA domain
> 1 Reads to memory in the non-local NUMA domain
> 0 Reads to memory in the local NUMA domain
> ==== ===========================================================
>
> By default, the mbm_total_bytes_config is set to 0x7f to count all the
> event types.
>
> For example:
> $cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> 0=0x7f;1=0x7f;2=0x7f;3=0x7f
>
> In this case, the event mbm_total_bytes is currently configured
> with 0x7f on domains 0 to 3.

"currently" can be removed since it already starts with "In this case".


> Signed-off-by: Babu Moger <babu.moger@xxxxxxx>
> ---
> arch/x86/include/asm/msr-index.h | 1
> arch/x86/kernel/cpu/resctrl/internal.h | 24 ++++++++
> arch/x86/kernel/cpu/resctrl/monitor.c | 4 +
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 99 ++++++++++++++++++++++++++++++++
> 4 files changed, 127 insertions(+), 1 deletion(-)
>

...

> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 8342feb54a7f..e93b1c206116 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1423,6 +1423,90 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
> return ret;
> }
>
> +struct mon_config_info {
> + u32 evtid;
> + u32 mon_config;
> +};
> +
> +#define INVALID_CONFIG_INDEX UINT_MAX
> +
> +/**
> + * mon_event_config_index_get - get the index for the configurable event

Could you say "get the hardware index" to help clarify what the index is
for?

> + * @evtid: event id.
> + *
> + * Return: 0 for evtid == QOS_L3_MBM_TOTAL_EVENT_ID
> + * 1 for evtid == QOS_L3_MBM_LOCAL_EVENT_ID
> + * INVALID_CONFIG_INDEX for invalid evtid
> + */
> +static inline unsigned int mon_event_config_index_get(u32 evtid)
> +{
> + switch (evtid) {
> + case QOS_L3_MBM_TOTAL_EVENT_ID:
> + return 0;
> + case QOS_L3_MBM_LOCAL_EVENT_ID:
> + return 1;
> + default:
> + /* WARN */
> + return INVALID_CONFIG_INDEX;
> + }
> +}

I see that you copied my sample code here. My intention was that the
/* WARN */ comment be replaced with an actual warning. As a comment
it does not add value. Since the caller now prints a subtler warning it
could just be:

/* Should never reach here */
return INVALID_CONFIG_INDEX;

> +
> +static void mon_event_config_read(void *info)
> +{
> + struct mon_config_info *mon_info = info;
> + u32 h, index;

index can be "unsigned int" as returned by mon_event_config_index_get()

> +
> + index = mon_event_config_index_get(mon_info->evtid);
> + if (index == INVALID_CONFIG_INDEX) {
> + pr_warn_once("Invalid event id %d\n", mon_info->evtid);
> + return;
> + }
> + rdmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, h);
> +
> + /* Report only the valid event configuration bits */
> + mon_info->mon_config &= MAX_EVT_CONFIG_BITS;
> +}
> +
> +static void mondata_config_read(struct rdt_domain *d, struct mon_config_info *mon_info)
> +{
> + smp_call_function_any(&d->cpu_mask, mon_event_config_read, mon_info, 1);
> +}
> +
> +static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid)
> +{
> + struct mon_config_info mon_info = {0};
> + struct rdt_domain *dom;
> + bool sep = false;
> +
> + mutex_lock(&rdtgroup_mutex);
> +
> + list_for_each_entry(dom, &r->domains, list) {
> + if (sep)
> + seq_puts(s, ";");
> +
> + mon_info.evtid = evtid;
> + mondata_config_read(dom, &mon_info);
> +

For robustness, please reset mon_config before calling mondata_config_read(). Since
mon_event_config_read() may (yes this is very unlikely) exit early then mon_config
will contain the data from the previous domain.

> + seq_printf(s, "%d=0x%02x", dom->id, mon_info.mon_config);
> + sep = true;
> + }
> + seq_puts(s, "\n");
> +
> + mutex_unlock(&rdtgroup_mutex);
> +
> + return 0;
> +}
> +

...

Reinette