Re: [PATCH v2 3/4] Add support for AMD Core Perf Extension in guest

From: Natarajan, Janakarajan
Date: Wed Nov 15 2017 - 14:04:48 EST


On 11/9/2017 12:34 PM, Borislav Petkov wrote:
Subject: Re: [PATCH v2 3/4] Add support for AMD Core Perf Extension in guest
Btw, your subjects need a prefix:

"x86/kvm: Add guest support for the AMD core performance counters"

for example.

Okay.

On Mon, Nov 06, 2017 at 11:44:25AM -0600, Janakarajan Natarajan wrote:
This patch adds support for AMD Core Performance counters in the guest.
Never say "This patch" in the commit message of a patch. It is
tautologically useless.

Okay.

The base event select and counter MSRs are changed. In addition, with
the core extension, there are 2 extra counters available for performance
measurements for a total of 6.

With the new MSRs, the logic to map them to the gp_counters[] is changed.
New functions are introduced to get the right base MSRs and to check the
validity of the get/set MSRs.

If the guest has vcpus of either family 16h or a generation < 15h, it
You're only talking about families here, the "generation" thing is
confusing.

I'll change that.

falls back to using K7 MSRs and the number of counters the guest can
access is set to 4.

Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@xxxxxxx>
---
arch/x86/kvm/pmu_amd.c | 133 +++++++++++++++++++++++++++++++++++++++++++------
arch/x86/kvm/x86.c | 1 +
2 files changed, 120 insertions(+), 14 deletions(-)
...

+static inline struct kvm_pmc *get_gp_pmc_amd(struct kvm_pmu *pmu, u32 msr,
+ enum pmu_type type)
+{
+ unsigned int base = get_msr_base(pmu, type);
+
+ if (base == MSR_F15H_PERF_CTL) {
+ switch (msr) {
+ case MSR_F15H_PERF_CTL0:
+ case MSR_F15H_PERF_CTL1:
+ case MSR_F15H_PERF_CTL2:
+ case MSR_F15H_PERF_CTL3:
+ case MSR_F15H_PERF_CTL4:
+ case MSR_F15H_PERF_CTL5:
+ /*
+ * AMD Perf Extension MSRs are not continuous.
+ *
+ * E.g. MSR_F15H_PERF_CTR0 -> 0xc0010201
+ * MSR_F15H_PERF_CTR1 -> 0xc0010203
+ *
+ * These are mapped to work with gp_counters[].
+ * The index into the array is calculated by
+ * dividing the difference between the requested
+ * msr and the msr base by 2.
+ *
+ * E.g. MSR_F15H_PERF_CTR1 uses
+ * ->gp_counters[(0xc0010203-0xc0010201)/2]
+ * ->gp_counters[1]
+ */
+ return &pmu->gp_counters[(msr - base) >> 1];
Ok, it took me a bit of staring to understand what you're doing here.
And frankly, this scheme is silly and fragile. You're relying on the
fact that you can do math with the MSR numbers to get you the GP counter
number. The moment that changes in future families, you are going to
have to devise a new scheme for the new family.

And instead of doing that, you're much better off producing a simple
MSR -> counter mapping for each family which is a simple switch-case.
No need to do get_msr_base() and whatnot - you simply feed in the MSR
number and the function spits out a gp_counters index. You only need to
check the family of the vcpu.

Then lookers will be able to understand the code at a quick glance too.

...

I'll put out a v3 with those changes.

@@ -153,8 +251,15 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
static void amd_pmu_refresh(struct kvm_vcpu *vcpu)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+ int family, nr_counters;
+
+ family = guest_cpuid_family(vcpu);
+ if (family == 0x15 || family == 0x17)
+ nr_counters = AMD64_NUM_COUNTERS_CORE;
+ else
+ nr_counters = AMD64_NUM_COUNTERS;
- pmu->nr_arch_gp_counters = AMD64_NUM_COUNTERS;
+ pmu->nr_arch_gp_counters = nr_counters;
pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << 48) - 1;
pmu->reserved_bits = 0xffffffff00200000ull;
/* not applicable to AMD; but clean them to prevent any fall out */
@@ -169,7 +274,7 @@ static void amd_pmu_init(struct kvm_vcpu *vcpu)
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
int i;
- for (i = 0; i < AMD64_NUM_COUNTERS ; i++) {
+ for (i = 0; i < AMD64_NUM_COUNTERS_CORE ; i++) {
This all works because INTEL_PMC_MAX_GENERIC is bigger than the AMD num
counters but you need to check all that.

Also, the finding out of the nr_counters you do in amd_pmu_refresh()
should happen here, in the init function so that you have
pmu->nr_arch_gp_counters properly set and then when you iterate over
counters in the remaining functions, you do:

for (i = 0; i < pmu->nr_arch_gp_counters ; i++) {

instead of using those defines which are not always correct, depending
on the family.

So, when the amd_pmu_init is called, a query to guest_cpuid_family() gives a value of -1.
If this is the case where qemu sets the family details later on with a kvm ioctl, it would make sense
to just initialize the maximum number of gp_counters that will be used and set the nr_arch_gp_counters
based on the family during the amd_pmu_refresh().

Thanks,
Janakarajan