Re: [PATCH 1/7] KVM: selftests: Test Intel PMU architectural events on gp counters

From: Sean Christopherson
Date: Wed May 24 2023 - 18:33:20 EST


On Thu, Mar 23, 2023, Like Xu wrote:
> From: Like Xu <likexu@xxxxxxxxxxx>
>
> Add test cases to check if different Architectural events are available
> after it's marked as unavailable via CPUID. It covers vPMU event filtering
> logic based on Intel CPUID, which is a complement to pmu_event_filter.
>
> According to Intel SDM, the number of architectural events is reported
> through CPUID.0AH:EAX[31:24] and the architectural event x is
> supported if EBX[x]=0 && EAX[31:24]>x.
>
> Co-developed-by: Jinrong Liang <cloudliang@xxxxxxxxxxx>
> Signed-off-by: Jinrong Liang <cloudliang@xxxxxxxxxxx>
> Signed-off-by: Like Xu <likexu@xxxxxxxxxxx>
> ---
> tools/testing/selftests/kvm/Makefile | 1 +
> .../selftests/kvm/x86_64/pmu_cpuid_test.c | 202 ++++++++++++++++++

"cpuid" is a rather confusing name, e.g. I was half expecting tests where the guest
is executing CPUID. IIUC, this is really just a basic functionality test, for a
somewhat loose definition of "basic".

Maybe something like pmu_basic_functionality_test? Or pmu_functional_test?

> 2 files changed, 203 insertions(+)
> create mode 100644 tools/testing/selftests/kvm/x86_64/pmu_cpuid_test.c

...

> +#define EVENTSEL_OS BIT_ULL(17)
> +#define EVENTSEL_EN BIT_ULL(22)
> +#define PMU_CAP_FW_WRITES BIT_ULL(13)
> +#define EVENTS_MASK GENMASK_ULL(7, 0)
> +#define PMU_VERSION_MASK GENMASK_ULL(7, 0)
> +#define GP_CTR_NUM_OFS_BIT 8
> +#define GP_CTR_NUM_MASK GENMASK_ULL(15, GP_CTR_NUM_OFS_BIT)
> +#define EVT_LEN_OFS_BIT 24
> +#define EVT_LEN_MASK GENMASK_ULL(31, EVT_LEN_OFS_BIT)
> +
> +#define ARCH_EVENT(select, umask) (((select) & 0xff) | ((umask) & 0xff) << 8)
> +
> +/*
> + * Intel Pre-defined Architectural Performance Events. Note some events
> + * are skipped for testing due to difficulties in stable reproduction.
> + */
> +static const uint64_t arch_events[] = {
> + [0] = ARCH_EVENT(0x3c, 0x0),
> + [1] = ARCH_EVENT(0xc0, 0x0),
> + [2] = ARCH_EVENT(0x3c, 0x1),
> + [3] = ARCH_EVENT(0x2e, 0x4f), /* LLC Reference */
> + [4] = ARCH_EVENT(0x2e, 0x41), /* LLC Misses */
> + [5] = ARCH_EVENT(0xc4, 0x0),
> + [6] = ARCH_EVENT(0xc5, 0x0), /* Branch Misses Retired */
> + [7] = ARCH_EVENT(0xa4, 0x1), /* Topdown Slots */
> +};

All of these definitions belong in library code. Probably something like
tools/testing/selftests/kvm/include/x86_64/pmu.h, or maybe intel_pmu.h?

> +static struct kvm_vcpu *new_vcpu(void *guest_code)

Assuming the next version is sent before the aforementioned cleanup, how about
defining this as

static struct kvm_vm *pmu_vm_create_with_one_vcpu(struct kvm_vcpu **vcpu,
void *guest_code)

so that the cleanup can be a more straightforward replacement. That should also
make the free_vcpu() wrapper completely unnecessary.

> +{
> + struct kvm_vm *vm;
> + struct kvm_vcpu *vcpu;
> +
> + vm = vm_create_with_one_vcpu(&vcpu, guest_code);
> + vm_init_descriptor_tables(vm);
> + vcpu_init_descriptor_tables(vcpu);

Ugh, I really need to put together that series to clean up the descriptor table
mess.

https://lore.kernel.org/all/Y8hCBOndYMD9zsDL@xxxxxxxxxx

> +
> + return vcpu;
> +}
> +
> +static void free_vcpu(struct kvm_vcpu *vcpu)
> +{
> + kvm_vm_free(vcpu->vm);

Just call kvm_vm_free(). Adding layers of unnecessary helpers just to provide
symmetry is not a good tradeoff. As above, tweak the name/prototyp of new_vcpu()
if you really want something approaching symmetry.

> +static void run_vcpu(struct kvm_vcpu *vcpu, const char *msg,
> + bool (*check_ucall)(struct ucall *uc, void *data),
> + void *expect_args)
> +{
> + struct ucall uc;
> +
> + for (;;) {
> + vcpu_run(vcpu);
> + switch (get_ucall(vcpu, &uc)) {
> + case UCALL_SYNC:
> + TEST_ASSERT(check_ucall(&uc, expect_args), "%s", msg);

A callback is both overkill and debug hostile. If the assert fails, it will report
this common code and not the caller. Since all uses grab the first ucall param,
just "return" that from the run_vcpu() wrapper as out param. Callers will need
to define their own loop, but that's one line of code, and it has the advantage
of making the code much more self-documenting

E.g.

static uint64_t run_vcpu(struct kvm_vcpu *vcpu, uint64_t *ucall_arg)
{
struct ucall uc;

vcpu_run(vcpu);
switch (get_ucall(vcpu, &uc)) {
case UCALL_SYNC:
ucall_arg = uc->args[1];
break;
case UCALL_DONE:
break;
default:
TEST_ASSERT(false, "Unexpected exit: %s",
exit_reason_str(vcpu->run->exit_reason));
}
return uc.cmd;
}

and in intel_test_fixed_counters() as an example:

while (run_vcpu(vcpu, &val) != UCALL_DONE)
TEST_ASSERT(val, "One or more fixed counters not counting");

> +static void test_arch_events_setup(struct kvm_vcpu *vcpu, uint8_t evt_vector,
> + uint8_t unavl_mask, uint8_t idx)
> +{
> + struct kvm_cpuid_entry2 *entry;
> + uint32_t ctr_msr = MSR_IA32_PERFCTR0;
> + bool is_supported;
> +
> + entry = vcpu_get_cpuid_entry(vcpu, 0xa);
> + entry->eax = (entry->eax & ~EVT_LEN_MASK) |
> + (evt_vector << EVT_LEN_OFS_BIT);
> + entry->ebx = (entry->ebx & ~EVENTS_MASK) | unavl_mask;
> + vcpu_set_cpuid(vcpu);
> +
> + if (vcpu_get_msr(vcpu, MSR_IA32_PERF_CAPABILITIES) & PMU_CAP_FW_WRITES)
> + ctr_msr = MSR_IA32_PMC0;
> +
> + /* Arch event x is supported if EBX[x]=0 && EAX[31:24]>x */
> + is_supported = !(entry->ebx & BIT_ULL(idx)) &&
> + (((entry->eax & EVT_LEN_MASK) >> EVT_LEN_OFS_BIT) > idx);
> +
> + vcpu_args_set(vcpu, 5, entry->eax & PMU_VERSION_MASK,
> + (entry->eax & GP_CTR_NUM_MASK) >> GP_CTR_NUM_OFS_BIT,
> + is_supported, ctr_msr, arch_events[idx]);

Why pass all this stuff in as params? Wouldn't it be easier to query CPUID from
the guest side? And define KVM_X86_CPU_PROPERTY macros where possible to avoid
having to open code things in multiple places.

> +static void intel_check_arch_event_is_unavl(uint8_t idx)
> +{
> + const char *msg = "Unavailable arch event is counting.";
> + uint8_t eax_evt_vec, ebx_unavl_mask, i, j;
> + struct kvm_vcpu *vcpu;
> +
> + /*
> + * A brute force iteration of all combinations of values is likely to
> + * exhaust the limit of the single-threaded thread fd nums, so it's
> + * tested here by iterating through all valid values on a single bit.
> + */
> + for (i = 0; i < ARRAY_SIZE(arch_events); i++) {
> + eax_evt_vec = BIT_ULL(i);
> + for (j = 0; j < ARRAY_SIZE(arch_events); j++) {
> + ebx_unavl_mask = BIT_ULL(j);
> +
> + vcpu = new_vcpu(intel_guest_run_arch_event);
> + test_arch_events_setup(vcpu, eax_evt_vec,
> + ebx_unavl_mask, idx);
> + run_vcpu(vcpu, msg, first_uc_arg_non_zero, NULL);

Oof, this is super confusing. At first, second, and third glances, it looks like
this is asserting that the counter _is_ counting. It took me a while to find this
code in the guest:

GUEST_SYNC(supported == !!_rdpmc(RDPMC_FIXED_BASE | i));

Either have the guest do GUEST_ASSERT() directly and process the output in the
common run_vcpu(), or just have the guest return the counter value and do the
assert here. My vote would be the latter, e.g.

while (run_vcpu(vcpu, &counter_val) != UCALL_DONE)
TEST_ASSERT(is_supported == !!counter_val,
"blah blah blah");

That'd also likely avoid having to plumb the "supported" info into the guest.