Re: [PATCH v5 12/20] x86/resctrl: Add data structures and definitions for ABMC assignment

From: Reinette Chatre
Date: Fri Jul 12 2024 - 18:13:26 EST


Hi Babu,

On 7/3/24 2:48 PM, Babu Moger wrote:
The ABMC feature provides an option to the user to assign a hardware
counter to an RMID and monitor the bandwidth as long as the counter
is assigned. The bandwidth events will be tracked by the hardware until
the user changes the configuration. Each resctrl group can configure
maximum two counters, one for total event and one for local event.

The counters are configured by writing to MSR L3_QOS_ABMC_CFG.
Configuration is done by setting the counter id, bandwidth source (RMID)
and bandwidth configuration supported by BMEC(Bandwidth Monitoring Event
Configuration). Reading L3_QOS_ABMC_DSC returns the configuration of the
counter id specified in L3_QOS_ABMC_CFG.

Attempts to read or write these MSRs when ABMC is not enabled will result
in a #GP(0) exception.

Introduce data structures and definitions for ABMC assignments.

MSR L3_QOS_ABMC_CFG (0xC000_03FDh) and L3_QOS_ABMC_DSC (0xC000_03FEh)
details.
=========================================================================
Bits Mnemonic Description Access Reset
Type Value
=========================================================================
63 CfgEn Configuration Enable R/W 0

62 CtrEn Enable/disable Tracking R/W 0

61:53 – Reserved MBZ 0

52:48 CtrID Counter Identifier R/W 0

47 IsCOS BwSrc field is a CLOSID R/W 0
(not an RMID)

46:44 – Reserved MBZ 0

43:32 BwSrc Bandwidth Source R/W 0
(RMID or CLOSID)

31:0 BwType Bandwidth configuration R/W 0
to track for this counter
==========================================================================

The feature details are documented in the APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).

The changelog only describes the hardware interface yet the patch contains
part hardware interface part new driver support for hardware interface.


Signed-off-by: Babu Moger <babu.moger@xxxxxxx>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
---
v5: Moved assignment flags here (path 10/19 of v4).
Added MON_CNTR_UNSET definition to initialize cntr_id's.
More details in commit log.
Renamed few fields in l3_qos_abmc_cfg for readability.

v4: Added more descriptions.
Changed the name abmc_ctr_id to ctr_id.
Added L3_QOS_ABMC_DSC. Used for reading the configuration.

v3: No changes.

v2: No changes.
---
arch/x86/include/asm/msr-index.h | 2 ++
arch/x86/kernel/cpu/resctrl/internal.h | 40 ++++++++++++++++++++++++++
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 18 ++++++++++++
3 files changed, 60 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 263b2d9d00ed..5e44ff91f459 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1175,6 +1175,8 @@
#define MSR_IA32_SMBA_BW_BASE 0xc0000280
#define MSR_IA32_EVT_CFG_BASE 0xc0000400
#define MSR_IA32_L3_QOS_EXT_CFG 0xc00003ff
+#define MSR_IA32_L3_QOS_ABMC_CFG 0xc00003fd
+#define MSR_IA32_L3_QOS_ABMC_DSC 0xc00003fe
/* MSR_IA32_VMX_MISC bits */
#define MSR_IA32_VMX_MISC_INTEL_PT (1ULL << 14)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 4cb1a5d014a3..6925c947682d 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -100,6 +100,18 @@ cpumask_any_housekeeping(const struct cpumask *mask, int exclude_cpu)
/* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature */
#define ABMC_ENABLE BIT(0)
+/*
+ * Assignment flags for ABMC feature
+ */
+#define ASSIGN_NONE 0
+#define ASSIGN_TOTAL BIT(QOS_L3_MBM_TOTAL_EVENT_ID)
+#define ASSIGN_LOCAL BIT(QOS_L3_MBM_LOCAL_EVENT_ID)

These flags do not appear to be part of hardware interface and there
is no explanation for what they mean or how they will be used. They are
also not used in this patch. It is thus not possible to understand if
they belong in this patch or is appropriate in this work.

+
+#define MON_CNTR_UNSET U32_MAX
+
+/* Maximum assignable counters per resctrl group */
+#define MAX_CNTRS 2
+
struct rdt_fs_context {
struct kernfs_fs_context kfc;
bool enable_cdpl2;
@@ -228,12 +240,14 @@ enum rdtgrp_mode {
* @parent: parent rdtgrp
* @crdtgrp_list: child rdtgroup node list
* @rmid: rmid for this rdtgroup
+ * @cntr_id: ABMC counter ids assigned to this group

struct mongroup is private to resctrl fs so it cannot contain an
architecture specific feature. Having it contain a generic "cntr_id"
may be ok at this point, but it should not be termed "ABMC counter".

*/
struct mongroup {
struct kernfs_node *mon_data_kn;
struct rdtgroup *parent;
struct list_head crdtgrp_list;
u32 rmid;
+ u32 cntr_id[MAX_CNTRS];

This is a significant addition yet is silently included as part of a patch
that just introduces hardware interface. This is how resctrl will manage
the hardware counters. It is significant since this is what dictates that it
is resctrl fs that will manage the counters, which makes it important which
interfaces are made available and from where it is called. Through
this series I have also not come across a description of this architecture.
With this introduction counters are maintained per monitor group, yet
the new interface supports assigining counters per domain. There
is no high level explanation of this architecture and the reader is forced
to decipher it from the implementation making this work harder to review
that necessary.

Would it be possible to present the fs and architecture code
separately? I think doing so will make it easier to understand.

};
/**
@@ -607,6 +621,32 @@ union cpuid_0x10_x_edx {
unsigned int full;
};
+/*
+ * ABMC counters can be configured by writing to L3_QOS_ABMC_CFG.
+ * @bw_type : Bandwidth configuration(supported by BMEC)
+ * to track this counter id.

Does "to track this counter id" mean "tracked by @cntr_id"?

+ * @bw_src : Bandwidth Source (RMID or CLOSID).

Please do not capitalize words mid sentence, like "Source"
above, "Identifier", and "Enable" in two instances below.

+ * @reserved1 : Reserved.
+ * @is_clos : BwSrc field is a CLOSID (not an RMID).

Just stick to @bw_src.

+ * @cntr_id : Counter Identifier.
+ * @reserved : Reserved.
+ * @cntr_en : Tracking Enable bit.

Can this be more detailed about what happens when this bit is set/clear?

+ * @cfg_en : Configuration Enable bit.

What is difference between "configuration enable" and "tracking enable"?
What is relationship, if any, to @bw_type that is the bandwidth configuration?

+ */
+union l3_qos_abmc_cfg {
+ struct {
+ unsigned long bw_type :32,
+ bw_src :12,
+ reserved1: 3,
+ is_clos : 1,
+ cntr_id : 5,
+ reserved : 9,
+ cntr_en : 1,
+ cfg_en : 1;
+ } split;

Please check the spacing in this data structure. Tabs are used inconsistently
and the members are not lining up either.

+ unsigned long full;
+};
+
void rdt_last_cmd_clear(void);
void rdt_last_cmd_puts(const char *s);
__printf(1, 2)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 91c5d45ac367..d2663f1345b7 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2505,6 +2505,7 @@ static void resctrl_abmc_set_one_amd(void *arg)
static int _resctrl_abmc_enable(struct rdt_resource *r, bool enable)
{
+ struct rdtgroup *prgrp, *crgrp;
struct rdt_mon_domain *d;
/*
@@ -2513,6 +2514,17 @@ static int _resctrl_abmc_enable(struct rdt_resource *r, bool enable)
*/
mbm_cntrs_init();
+ /* Reset the cntr_id's for all the monitor groups */
+ list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
+ prgrp->mon.cntr_id[0] = MON_CNTR_UNSET;
+ prgrp->mon.cntr_id[1] = MON_CNTR_UNSET;
+ list_for_each_entry(crgrp, &prgrp->mon.crdtgrp_list,
+ mon.crdtgrp_list) {
+ crgrp->mon.cntr_id[0] = MON_CNTR_UNSET;
+ crgrp->mon.cntr_id[1] = MON_CNTR_UNSET;
+ }
+ }
+

No. The counters are in the monitor group that is a structure that is private
to the fs. The architecture code should not be accessing it. This should only be
done by fs code.

/*
* Hardware counters will reset after switching the monitor mode.
* Reset the architectural state so that reading of hardware
@@ -3573,6 +3585,8 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
return ret;
}
rdtgrp->mon.rmid = ret;
+ rdtgrp->mon.cntr_id[0] = MON_CNTR_UNSET;
+ rdtgrp->mon.cntr_id[1] = MON_CNTR_UNSET;
ret = mkdir_mondata_all(rdtgrp->kn, rdtgrp, &rdtgrp->mon.mon_data_kn);
if (ret) {
@@ -4128,6 +4142,10 @@ static void __init rdtgroup_setup_default(void)
rdtgroup_default.closid = RESCTRL_RESERVED_CLOSID;
rdtgroup_default.mon.rmid = RESCTRL_RESERVED_RMID;
rdtgroup_default.type = RDTCTRL_GROUP;
+
+ rdtgroup_default.mon.cntr_id[0] = MON_CNTR_UNSET;
+ rdtgroup_default.mon.cntr_id[1] = MON_CNTR_UNSET;
+
INIT_LIST_HEAD(&rdtgroup_default.mon.crdtgrp_list);
list_add(&rdtgroup_default.rdtgroup_list, &rdt_all_groups);

Reinette