Re: [PATCH 1/3] x86/resctrl: Get max rmid and occupancy scale directly from CPUID instead of cpuinfo_x86

From: Fenghua Yu
Date: Sun Jun 16 2019 - 23:32:29 EST


On Sun, Jun 16, 2019 at 10:24:13PM +0200, Thomas Gleixner wrote:
> On Sun, 16 Jun 2019, Fenghua Yu wrote:
>
> > Although x86_cache_max_rmid and x86_cache_occ_scale are only read once
> > during resctrl initialization, they are always stored in cpuinfo_x86 for
> > each CPU during run time without usage. Even if resctrl is not
> > configured, they still occupy space in cpuinfo_x86.
>
> And that's a problem because?
>
> > To save cpuinfo_x86 space and make CPU and resctrl initialization simpler,
> > remove the two fields from cpuinfo_x86 and get max rmid and occupancy
>
> What is simpler? The fact that more code fiddles with CPUID? That's exactly
> the wrong direction.
>
> The storage size of struct cpuinfo is largely uninteresting especially as
> long as we keep num_online_cpus copies of the same information around.
>
> Just grep for places which invoke CPUID and then look how many of them do
> it over and over and even the code in arch/x86/kernel/cpu/ is an
> unpenetrable mess for exactly this reason.
>
> The right thing to do is to have one instance of the CPUID information
> which is
>
> - a proper data structure with named fields and named bits
>
> - a single master instance which can be mapped to all CPUs
>
> This data structure is filled in in one go by reading out all leaves and
> not by the maze we have today which puts together selected parts piecewise
> and never exposes a full and consistent picture.
>
> This allows to remove all these custom copies of CPUID leaf readouts and
> allows proper filtering/disabling at a central place.
>
> Making it a proper data structure with fields and bits gets rid of all that
> hex masking/shifting nonsense which is used to decode parts of those
> fields.
>
> That's not a performance issue because all performance critical code should
> use static_cpu_has() anyway. For non critical code boot_cpu_has() is
> sufficient.
>
> Upcoming secondary CPUs would do a sanity check on their CPUID content to
> check whether everything is symmetric. We should do that actually today
> because not detecting asymetric features early leads to exactly the issue
> which was fixed recently with loading the micro code earlier than perf.
> But we can't because the information retrieval is done in a gazillion of
> places.
>
> Now you might argue that the upcoming asymetric processors (SIGH!) will
> require per CPU instances of the feature leafs. Sure that needs some
> thought, but it needs thought even with the current code and I'm absolutely
> not interested to duct tape that stuff into the current code.
>
> The solution for this with the above scheme is to utilize the feature
> mismatch detection and have a proper classification which features are
> allowed to deviate and which are not. For those which can deviate, we can
> provide separate storage as this information needs to be propagated to
> other entities (fault handlers, placement code, xsave variants etc.). But
> that's a limited amount of information and the bulk will still be the same.
>
> This mismatch detection is essential for dealing with future asymetric
> CPUs proper. When the kernel detects it, it can disable mismatching
> features which are not yet handled by code which has to be aware of them.
>
> And disabling them means that with that scheme we can actually trap CPUID
> in userspace and provide it consistent and filtered information instead
> of having the mismatch between kernel view and user space view.
>
> Borislav has experimented with that already, but thanks to the marvelous
> security features built into Intel (and other) CPUs this is still mostly a
> drawing board exercise.
>
> Just for the record. Before this cleanup takes place, I'm not even looking
> at any patches which attempt to support asymetric processors. The current
> supply of duct tape engineering horrors is sufficient for bad mood. No need
> for more of that.

I see. Then this patch #1 doesn't make sense.

So in the next version, can I remove the patch #1, change the patch #2 as
follows, and keep the patch #3?

It's a waste for the four X86_FEATURE_CQM_* features to occupy two
pure feature bits words. To better utilize feature words, re-define
word 11 to host scattered features and move the four X86_FEATURE_CQM_*
features into Linux defined word 11. More scattered features can be added
in word 11 in the future.

Leaf 11 in cpuid_leafs is renamed as CPUID_LNX_4 to reflect it's a Linux
defined leaf.

Although word 12 doesn't have any feature now, leaf 12 in cpuid_leafs
still needs to be kept because cpuid_leafs must have NCAPINTS leafs.
Rename leaf 12 as CPUID_DUMMY which will be replaced by a meaningful name
in the next patch when CPUID.7.1:EAX occupies world 12.

KVM doesn't support resctrl now. So it's safe to move the
X86_FEATURE_CQM_* features to scattered features word 11 for KVM.

Signed-off-by: Fenghua Yu <fenghua.yu@xxxxxxxxx>
---
arch/x86/include/asm/cpufeature.h | 4 +--
arch/x86/include/asm/cpufeatures.h | 17 ++++++----
arch/x86/kernel/cpu/common.c | 53 +++++++++++++++---------------
arch/x86/kernel/cpu/cpuid-deps.c | 3 ++
arch/x86/kernel/cpu/scattered.c | 4 +++
arch/x86/kvm/cpuid.h | 2 --
6 files changed, 45 insertions(+), 38 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 1d337c51f7e6..403f70c2e431 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -22,8 +22,8 @@ enum cpuid_leafs
CPUID_LNX_3,
CPUID_7_0_EBX,
CPUID_D_1_EAX,
- CPUID_F_0_EDX,
- CPUID_F_1_EDX,
+ CPUID_LNX_4,
+ CPUID_DUMMY,
CPUID_8000_0008_EBX,
CPUID_6_EAX,
CPUID_8000_000A_EDX,
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 75f27ee2c263..4f0a3d093794 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -269,13 +269,16 @@
#define X86_FEATURE_XGETBV1 (10*32+ 2) /* XGETBV with ECX = 1 instruction */
#define X86_FEATURE_XSAVES (10*32+ 3) /* XSAVES/XRSTORS instructions */

-/* Intel-defined CPU QoS Sub-leaf, CPUID level 0x0000000F:0 (EDX), word 11 */
-#define X86_FEATURE_CQM_LLC (11*32+ 1) /* LLC QoS if 1 */
-
-/* Intel-defined CPU QoS Sub-leaf, CPUID level 0x0000000F:1 (EDX), word 12 */
-#define X86_FEATURE_CQM_OCCUP_LLC (12*32+ 0) /* LLC occupancy monitoring */
-#define X86_FEATURE_CQM_MBM_TOTAL (12*32+ 1) /* LLC Total MBM monitoring */
-#define X86_FEATURE_CQM_MBM_LOCAL (12*32+ 2) /* LLC Local MBM monitoring */
+/*
+ * Extended auxiliary flags: Linux defined - For features scattered in various
+ * CPUID levels like 0xf, word 11.
+ *
+ * Reuse free bits when adding new feature flags!
+ */
+#define X86_FEATURE_CQM_LLC (11*32+ 0) /* LLC QoS if 1 */
+#define X86_FEATURE_CQM_OCCUP_LLC (11*32+ 1) /* LLC occupancy monitoring */
+#define X86_FEATURE_CQM_MBM_TOTAL (11*32+ 2) /* LLC Total MBM monitoring */
+#define X86_FEATURE_CQM_MBM_LOCAL (11*32+ 3) /* LLC Local MBM monitoring */

/* AMD-defined CPU features, CPUID level 0x80000008 (EBX), word 13 */
#define X86_FEATURE_CLZERO (13*32+ 0) /* CLZERO instruction */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 2c57fffebf9b..f080be35da41 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -801,6 +801,31 @@ static void init_speculation_control(struct cpuinfo_x86 *c)
}
}

+static void get_cqm_info(struct cpuinfo_x86 *c)
+{
+ if (cpu_has(c, X86_FEATURE_CQM_LLC)) {
+ u32 eax, ebx, ecx, edx;
+
+ /* QoS sub-leaf, EAX=0Fh, ECX=0 */
+ cpuid_count(0x0000000F, 0, &eax, &ebx, &ecx, &edx);
+ /* will be overridden if occupancy monitoring exists */
+ c->x86_cache_max_rmid = ebx;
+
+ if (cpu_has(c, X86_FEATURE_CQM_OCCUP_LLC) ||
+ cpu_has(c, X86_FEATURE_CQM_MBM_TOTAL) ||
+ cpu_has(c, X86_FEATURE_CQM_MBM_LOCAL)) {
+ /* QoS sub-leaf, EAX=0Fh, ECX=1 */
+ cpuid_count(0x0000000F, 1, &eax, &ebx, &ecx, &edx);
+
+ c->x86_cache_max_rmid = ecx;
+ c->x86_cache_occ_scale = ebx;
+ }
+ } else {
+ c->x86_cache_max_rmid = -1;
+ c->x86_cache_occ_scale = -1;
+ }
+}
+
void get_cpu_cap(struct cpuinfo_x86 *c)
{
u32 eax, ebx, ecx, edx;
@@ -832,33 +857,6 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
c->x86_capability[CPUID_D_1_EAX] = eax;
}

- /* Additional Intel-defined flags: level 0x0000000F */
- if (c->cpuid_level >= 0x0000000F) {
-
- /* QoS sub-leaf, EAX=0Fh, ECX=0 */
- cpuid_count(0x0000000F, 0, &eax, &ebx, &ecx, &edx);
- c->x86_capability[CPUID_F_0_EDX] = edx;
-
- if (cpu_has(c, X86_FEATURE_CQM_LLC)) {
- /* will be overridden if occupancy monitoring exists */
- c->x86_cache_max_rmid = ebx;
-
- /* QoS sub-leaf, EAX=0Fh, ECX=1 */
- cpuid_count(0x0000000F, 1, &eax, &ebx, &ecx, &edx);
- c->x86_capability[CPUID_F_1_EDX] = edx;
-
- if ((cpu_has(c, X86_FEATURE_CQM_OCCUP_LLC)) ||
- ((cpu_has(c, X86_FEATURE_CQM_MBM_TOTAL)) ||
- (cpu_has(c, X86_FEATURE_CQM_MBM_LOCAL)))) {
- c->x86_cache_max_rmid = ecx;
- c->x86_cache_occ_scale = ebx;
- }
- } else {
- c->x86_cache_max_rmid = -1;
- c->x86_cache_occ_scale = -1;
- }
- }
-
/* AMD-defined flags: level 0x80000001 */
eax = cpuid_eax(0x80000000);
c->extended_cpuid_level = eax;
@@ -889,6 +887,7 @@ void get_cpu_cap(struct cpuinfo_x86 *c)

init_scattered_cpuid_features(c);
init_speculation_control(c);
+ get_cqm_info(c);

/*
* Clear/Set all flags overridden by options, after probe.
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 2c0bd38a44ab..fa07a224e7b9 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -59,6 +59,9 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_AVX512_4VNNIW, X86_FEATURE_AVX512F },
{ X86_FEATURE_AVX512_4FMAPS, X86_FEATURE_AVX512F },
{ X86_FEATURE_AVX512_VPOPCNTDQ, X86_FEATURE_AVX512F },
+ { X86_FEATURE_CQM_OCCUP_LLC, X86_FEATURE_CQM_LLC },
+ { X86_FEATURE_CQM_MBM_TOTAL, X86_FEATURE_CQM_LLC },
+ { X86_FEATURE_CQM_MBM_LOCAL, X86_FEATURE_CQM_LLC },
{}
};

diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 94aa1c72ca98..adf9b71386ef 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -26,6 +26,10 @@ struct cpuid_bit {
static const struct cpuid_bit cpuid_bits[] = {
{ X86_FEATURE_APERFMPERF, CPUID_ECX, 0, 0x00000006, 0 },
{ X86_FEATURE_EPB, CPUID_ECX, 3, 0x00000006, 0 },
+ { X86_FEATURE_CQM_LLC, CPUID_EDX, 1, 0x0000000f, 0 },
+ { X86_FEATURE_CQM_OCCUP_LLC, CPUID_EDX, 0, 0x0000000f, 1 },
+ { X86_FEATURE_CQM_MBM_TOTAL, CPUID_EDX, 1, 0x0000000f, 1 },
+ { X86_FEATURE_CQM_MBM_LOCAL, CPUID_EDX, 2, 0x0000000f, 1 },
{ X86_FEATURE_CAT_L3, CPUID_EBX, 1, 0x00000010, 0 },
{ X86_FEATURE_CAT_L2, CPUID_EBX, 2, 0x00000010, 0 },
{ X86_FEATURE_CDP_L3, CPUID_ECX, 2, 0x00000010, 1 },
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 9a327d5b6d1f..d78a61408243 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -47,8 +47,6 @@ static const struct cpuid_reg reverse_cpuid[] = {
[CPUID_8000_0001_ECX] = {0x80000001, 0, CPUID_ECX},
[CPUID_7_0_EBX] = { 7, 0, CPUID_EBX},
[CPUID_D_1_EAX] = { 0xd, 1, CPUID_EAX},
- [CPUID_F_0_EDX] = { 0xf, 0, CPUID_EDX},
- [CPUID_F_1_EDX] = { 0xf, 1, CPUID_EDX},
[CPUID_8000_0008_EBX] = {0x80000008, 0, CPUID_EBX},
[CPUID_6_EAX] = { 6, 0, CPUID_EAX},
[CPUID_8000_000A_EDX] = {0x8000000a, 0, CPUID_EDX},
--
2.19.1


Thanks.

-Fenghua