Re: [PATCH] x86/resctrl: Fix AMD L3 QOS CDP enable/disable

From: Reinette Chatre
Date: Wed Nov 18 2020 - 17:18:23 EST


Hi Babu,

On 11/6/2020 12:14 PM, Babu Moger wrote:
When the AMD QoS feature CDP(code and data prioritization) is enabled
or disabled, the CDP bit in MSR 0000_0C81 is written on one of the
cpus in L3 domain(core complex). That is not correct. The CDP bit needs
to be updated all the logical cpus in the domain.

Could you please use CPU instead of cpu throughout, in commit message as well as the new code comments?


This was not spelled out clearly in the spec earlier. The specification
has been updated. The updated specification, "AMD64 Technology Platform
Quality of Service Extensions Publication # 56375 Revision: 1.02 Issue
Date: October 2020" is available now. Refer the section: Code and Data
Prioritization.

Fix the issue by adding a new flag arch_needs_update_all in rdt_cache
data structure.

I understand that naming is hard and could be a sticky point. Even so, I am concerned that this name is too generic. For example, there are other cache settings that are successfully set on a single CPU in the L3 domain (the bitmasks for example). This new name and its description in the code comments below does not make it clear which cache settings it applies to.

I interpret this change to mean that the L[23]_QOS_CFG MSR has CPU scope while the other L3 QoS configuration registers have the same scope as the L3 cache. Could this new variable thus perhaps be named "arch_has_per_cpu_cfg"? I considered "arch_has_per_cpu_cdp" but when a new field is added to that register it may cause confusion.

The documentation can be obtained at the links below:
https://developer.amd.com/wp-content/resources/56375.pdf
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537

Fixes: 4d05bf71f157 ("x86/resctrl: Introduce AMD QOS feature")

Signed-off-by: Babu Moger <babu.moger@xxxxxxx>
---
arch/x86/kernel/cpu/resctrl/core.c | 3 +++
arch/x86/kernel/cpu/resctrl/internal.h | 3 +++
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 9 +++++++--
3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index e5f4ee8f4c3b..142c92a12254 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -570,6 +570,8 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
if (d) {
cpumask_set_cpu(cpu, &d->cpu_mask);
+ if (r->cache.arch_needs_update_all)
+ rdt_domain_reconfigure_cdp(r);
return;
}
@@ -943,6 +945,7 @@ static __init void rdt_init_res_defs_amd(void)
r->rid == RDT_RESOURCE_L2CODE) {
r->cache.arch_has_sparse_bitmaps = true;
r->cache.arch_has_empty_bitmaps = true;
+ r->cache.arch_needs_update_all = true;
} else if (r->rid == RDT_RESOURCE_MBA) {
r->msr_base = MSR_IA32_MBA_BW_BASE;
r->msr_update = mba_wrmsr_amd;

The current pattern is to set these flags on all the architectures. Could you thus please set the flag within rdt_init_defs_intel()? I confirmed that the scope is the same as the cache domain in Intel RDT so the flag should be false.

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 80fa997fae60..d23262d59a51 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -360,6 +360,8 @@ struct msr_param {
* executing entities
* @arch_has_sparse_bitmaps: True if a bitmap like f00f is valid.
* @arch_has_empty_bitmaps: True if the '0' bitmap is valid.
+ * @arch_needs_update_all: True if arch needs to update the cache
+ * settings on all the cpus in the domain.

Please do update this to make it clear what "cache settings" are referred to. Since this is in struct rdt_cache perhaps something like "QOS_CFG register for this cache level has CPU scope."

*/
struct rdt_cache {
unsigned int cbm_len;
@@ -369,6 +371,7 @@ struct rdt_cache {
unsigned int shareable_bits;
bool arch_has_sparse_bitmaps;
bool arch_has_empty_bitmaps;
+ bool arch_needs_update_all;
};
/**
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index af323e2e3100..a005e90b373a 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1905,8 +1905,13 @@ static int set_cache_qos_cfg(int level, bool enable)
r_l = &rdt_resources_all[level];
list_for_each_entry(d, &r_l->domains, list) {
- /* Pick one CPU from each domain instance to update MSR */
- cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);
+ if (r_l->cache.arch_needs_update_all)
+ /* Pick all the cpus in the domain instance */
+ for_each_cpu(cpu, &d->cpu_mask)
+ cpumask_set_cpu(cpu, cpu_mask);
+ else
+ /* Pick one CPU from each domain instance to update MSR */
+ cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);
}
cpu = get_cpu();
/* Update QOS_CFG MSR on this cpu if it's in cpu_mask. */


The solution looks good to me, thank you very much.

Reinette