Re: [RFC PATCH 02/41] perf: Support guest enter/exit interfaces
From: Sean Christopherson
Date: Thu Apr 18 2024 - 12:11:59 EST
On Wed, Apr 17, 2024, Xiong Y Zhang wrote:
> On 4/16/2024 8:48 PM, Liang, Kan wrote:
> >> x86_perf_get_mediated_pmu() is called at vm_create(),
> >> x86_perf_put_mediated_pmu() is called at vm_destroy(), then system wide
> >> perf events without exclude_guest=1 can not be created during the whole vm
> >> life cycle (where nr_mediated_pmu_vms > 0 always), do I understand and use
> >> the interface correctly ?
> >
> > Right, but it only impacts the events of PMU with the
> > PERF_PMU_CAP_MEDIATED_VPMU. For other PMUs, the event with exclude_guest=1
> > can still be created. KVM should not touch the counters of the PMU without
> > PERF_PMU_CAP_MEDIATED_VPMU.
> >
> > BTW: I will also remove the prefix x86, since the functions are in the
> > generic code.
> >
> > Thanks,
> > Kan
> After userspace VMM call VCPU SET_CPUID() ioctl, KVM knows whether vPMU is
> enabled or not. If perf_get_mediated_pmu() is called at vm create, it is too
> early.
Eh, if someone wants to create _only_ VMs without vPMUs, then they should load
KVM with enable_pmu=false. I can see people complaining about not being able to
create VMs if they don't want to use have *any* vPMU usage, but I doubt anyone
has a use cases where they want a mix of PMU-enabled and PMU- disabled VMs, *and*
they are ok with VM creation failing for some VMs but not others.
> it is better to let perf_get_mediated_pmu() track per cpu PMU state,
> so perf_get_mediated_pmu() can be called by kvm after vcpu_cpuid_set(). Note
> user space vmm may call SET_CPUID() on one vcpu multi times, then here
> refcount maybe isn't suitable.
Yeah, waiting until KVM_SET_CPUID2 would be unpleasant for both KVM and userspace.
E.g. failing KVM_SET_CPUID2 because KVM can't enable mediated PMU support would
be rather confusing for userspace.
> what's a better solution ?
If doing the checks at VM creation is a stick point for folks, then the best
approach is probably to use KVM_CAP_PMU_CAPABILITY, i.e. require userspace to
explicitly opt-in to enabling mediated PMU usage. Ha! We can even do that
without additional uAPI, because KVM interprets cap->args[0]==0 as "enable vPMU".
The big problem with this is that enabling mediated PMU support by default would
break userspace. Hmm, but that's arguably the case no matter what, as a setup
that worked before would suddenly start failing if the host was configured to use
the PMU-based NMI watchdog.
E.g. this, if we're ok commiting to never enabling mediated PMU by default.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 47d9f03b7778..01d9ee2114c8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6664,9 +6664,21 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
break;
mutex_lock(&kvm->lock);
- if (!kvm->created_vcpus) {
- kvm->arch.enable_pmu = !(cap->args[0] & KVM_PMU_CAP_DISABLE);
- r = 0;
+ /*
+ * To keep PMU configuration "simple", setting vPMU support is
+ * disallowed if vCPUs are created, or if mediated PMU support
+ * was already enabled for the VM.
+ */
+ if (!kvm->created_vcpus &&
+ (!enable_mediated_pmu || !kvm->arch.enable_pmu)) {
+ if (enable_mediated_pmu &&
+ !(cap->args[0] & KVM_PMU_CAP_DISABLE))
+ r = x86_perf_get_mediated_pmu();
+ else
+ r = 0;
+
+ if (!r)
+ kvm->arch.enable_pmu = !(cap->args[0] & KVM_PMU_CAP_DISABLE);
}
mutex_unlock(&kvm->lock);
break;
@@ -12563,7 +12575,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
kvm->arch.default_tsc_khz = max_tsc_khz ? : tsc_khz;
kvm->arch.guest_can_read_msr_platform_info = true;
- kvm->arch.enable_pmu = enable_pmu;
+
+ /* PMU virtualization is opt-in when mediated PMU support is enabled. */
+ kvm->arch.enable_pmu = enable_pmu && !enable_mediated_pmu;
#if IS_ENABLED(CONFIG_HYPERV)
spin_lock_init(&kvm->arch.hv_root_tdp_lock);