Re: [PATCH v3 2/2] x86/resctrl: Remove hard-coded memory bandwidth event configuration

From: Reinette Chatre
Date: Fri Jan 05 2024 - 16:18:36 EST


Hi Babu,

On 1/4/2024 1:21 PM, Babu Moger wrote:
> If the BMEC (Bandwidth Monitoring Event Configuration) feature is
> supported, the bandwidth events can be configured. The maximum supported
> bandwidth bitmask can be determined by following CPUID command.
>
> CPUID_Fn80000020_ECX_x03 [Platform QoS Monitoring Bandwidth Event
> Configuration] Read-only. Reset: 0000_007Fh.
> Bits Description
> 31:7 Reserved
> 6:0 Identifies the bandwidth sources that can be tracked.
>
> The bandwidth sources can change with the processor generations.
> Remove the hard-coded value and detect using CPUID command. Also,

I do not think "Remove the hard-coded value" is accurate anymore.

> print the valid bitmask when the user tries to configure invalid value.
>
> The CPUID details are documentation in the PPR listed below [1].

"are documentation" -> "are documented"

> [1] Processor Programming Reference (PPR) Vol 1.1 for AMD Family 19h Model
> 11h B1 - 55901 Rev 0.25.
>
> Fixes: dc2a3e857981 ("x86/resctrl: Add interface to read mbm_total_bytes_config")
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> Signed-off-by: Babu Moger <babu.moger@xxxxxxx>

Same comment about "Link:" as for patch 1/2.

> ---
> v3: Changed the event_mask name to mbm_cfg_mask. Added comments about the field.
> Reverted the masking of event configuration to original code.
> Few minor comment changes.
>
> v2: Earlier sent as a part of ABMC feature.
> https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@xxxxxxx/
> But this is not related to ABMC. Sending it separate now.
> Removed the global resctrl_max_evt_bitmask. Added event_mask as part of
> the resource.
> ---
> arch/x86/kernel/cpu/resctrl/internal.h | 3 +++
> arch/x86/kernel/cpu/resctrl/monitor.c | 6 ++++++
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 6 ++++--
> 3 files changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index d2979748fae4..e3dc35a00a19 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -394,6 +394,8 @@ struct rdt_parse_data {
> * @msr_update: Function pointer to update QOS MSRs
> * @mon_scale: cqm counter * mon_scale = occupancy in bytes
> * @mbm_width: Monitor width, to detect and correct for overflow.
> + * @mbm_cfg_mask: Bandwidth sources that can be tracked when Bandwidth
> + * Monitoring Event Configuration (BMEC) is supported.
> * @cdp_enabled: CDP state of this resource
> *
> * Members of this structure are either private to the architecture
> @@ -408,6 +410,7 @@ struct rdt_hw_resource {
> struct rdt_resource *r);
> unsigned int mon_scale;
> unsigned int mbm_width;
> + unsigned int mbm_cfg_mask;
> bool cdp_enabled;
> };
>
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index f136ac046851..acca577e2b06 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -813,6 +813,12 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
> return ret;
>
> if (rdt_cpu_has(X86_FEATURE_BMEC)) {
> + u32 eax, ebx, ecx, edx;
> +
> + /* Detect list of bandwidth sources that can be tracked */
> + cpuid_count(0x80000020, 3, &eax, &ebx, &ecx, &edx);
> + hw_res->mbm_cfg_mask = ecx & MAX_EVT_CONFIG_BITS;
> +
> if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
> mbm_total_event.configurable = true;
> mbm_config_rftype_init("mbm_total_bytes_config");
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 69a1de92384a..5b5a8f0ffb2f 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1617,12 +1617,14 @@ static void mon_event_config_write(void *info)
> static int mbm_config_write_domain(struct rdt_resource *r,
> struct rdt_domain *d, u32 evtid, u32 val)

Not specific to this patch, but since the valid mask is per resource I do not think
it is necessary to check user provided value for every domain. The user provided value
can be checked earlier and only once in mon_config_write() before iterating over all
domains to write the value.

> {
> + struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
> struct mon_config_info mon_info = {0};
> int ret = 0;
>
> /* mon_config cannot be more than the supported set of events */
> - if (val > MAX_EVT_CONFIG_BITS) {
> - rdt_last_cmd_puts("Invalid event configuration\n");
> + if ((val & hw_res->mbm_cfg_mask) != val) {
> + rdt_last_cmd_printf("Invalid input: The maximum valid bitmask is 0x%02x\n",
> + hw_res->mbm_cfg_mask);

I think keeping "Invalid event configuration" is useful to create a detailed message of:
"Invalid event configuration: maximum valid bitmask is 0x%02x"

Reinette