Re: [RFC PATCH 02/41] perf: Support guest enter/exit interfaces

From: Zhang, Xiong Y
Date: Thu Apr 18 2024 - 21:38:07 EST




On 4/19/2024 12:11 AM, Sean Christopherson wrote:
> On Wed, Apr 17, 2024, Xiong Y Zhang wrote:
>> On 4/16/2024 8:48 PM, Liang, Kan wrote:
>>>> x86_perf_get_mediated_pmu() is called at vm_create(),
>>>> x86_perf_put_mediated_pmu() is called at vm_destroy(), then system wide
>>>> perf events without exclude_guest=1 can not be created during the whole vm
>>>> life cycle (where nr_mediated_pmu_vms > 0 always), do I understand and use
>>>> the interface correctly ?
>>>
>>> Right, but it only impacts the events of PMU with the
>>> PERF_PMU_CAP_MEDIATED_VPMU. For other PMUs, the event with exclude_guest=1
>>> can still be created. KVM should not touch the counters of the PMU without
>>> PERF_PMU_CAP_MEDIATED_VPMU.
>>>
>>> BTW: I will also remove the prefix x86, since the functions are in the
>>> generic code.
>>>
>>> Thanks,
>>> Kan
>> After userspace VMM call VCPU SET_CPUID() ioctl, KVM knows whether vPMU is
>> enabled or not. If perf_get_mediated_pmu() is called at vm create, it is too
>> early.
>
> Eh, if someone wants to create _only_ VMs without vPMUs, then they should load
> KVM with enable_pmu=false. I can see people complaining about not being able to
> create VMs if they don't want to use have *any* vPMU usage, but I doubt anyone
> has a use cases where they want a mix of PMU-enabled and PMU- disabled VMs, *and*
> they are ok with VM creation failing for some VMs but not others.
enable_mediated_pmu and PMU-based nmi_watchdog are enabled by default on my ubuntu system, some ubuntu services create vm during ubuntu bootup, these ubuntu services fail after I add perf_get_mediated_pmu() in kvm_arch_init_vm(). so do this checking at vm creation may break some bootup services.

>
>> it is better to let perf_get_mediated_pmu() track per cpu PMU state,
>> so perf_get_mediated_pmu() can be called by kvm after vcpu_cpuid_set(). Note
>> user space vmm may call SET_CPUID() on one vcpu multi times, then here
>> refcount maybe isn't suitable.
>
> Yeah, waiting until KVM_SET_CPUID2 would be unpleasant for both KVM and userspace.
> E.g. failing KVM_SET_CPUID2 because KVM can't enable mediated PMU support would
> be rather confusing for userspace.
>
>> what's a better solution ?
>
> If doing the checks at VM creation is a stick point for folks, then the best
> approach is probably to use KVM_CAP_PMU_CAPABILITY, i.e. require userspace to
> explicitly opt-in to enabling mediated PMU usage. Ha! We can even do that
> without additional uAPI, because KVM interprets cap->args[0]==0 as "enable vPMU".
>
QEMU doesn't use KVM_CAP_PMU_CAPABILITY to enable/disable pmu. enable_cap(KVM_CAP_PMU_CAPABILITY) will be added into QEMU for mediated PMU.
With old QEMU, guest PMU will always use emulated vPMU, mediated PMU won't be enabled, if emulated vPMU is replaced later, the old QEMU guest will be broken.
> The big problem with this is that enabling mediated PMU support by default would
> break userspace. Hmm, but that's arguably the case no matter what, as a setup
> that worked before would suddenly start failing if the host was configured to use
> the PMU-based NMI watchdog.
Based on perf_get_mediated_pmu() interface, admin need to disable all the system wide perf events before vm creation, no matter where the perf_get_mediated_pmu() is called in vm_create() or enable_cap(KVM_CAP_PMU_CAPABILITY) ioctl.
>
> E.g. this, if we're ok commiting to never enabling mediated PMU by defau
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 47d9f03b7778..01d9ee2114c8 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6664,9 +6664,21 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> break;
>
> mutex_lock(&kvm->lock);
> - if (!kvm->created_vcpus) {
> - kvm->arch.enable_pmu = !(cap->args[0] & KVM_PMU_CAP_DISABLE);
> - r = 0;
> + /*
> + * To keep PMU configuration "simple", setting vPMU support is
> + * disallowed if vCPUs are created, or if mediated PMU support
> + * was already enabled for the VM.
> + */
> + if (!kvm->created_vcpus &&
> + (!enable_mediated_pmu || !kvm->arch.enable_pmu)) {
> + if (enable_mediated_pmu &&
> + !(cap->args[0] & KVM_PMU_CAP_DISABLE))
> + r = x86_perf_get_mediated_pmu();
> + else
> + r = 0;
> +
> + if (!r)
> + kvm->arch.enable_pmu = !(cap->args[0] & KVM_PMU_CAP_DISABLE);
> }
> mutex_unlock(&kvm->lock);
> break;
> @@ -12563,7 +12575,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>
> kvm->arch.default_tsc_khz = max_tsc_khz ? : tsc_khz;
> kvm->arch.guest_can_read_msr_platform_info = true;
> - kvm->arch.enable_pmu = enable_pmu;
> +
> + /* PMU virtualization is opt-in when mediated PMU support is enabled. */
> + kvm->arch.enable_pmu = enable_pmu && !enable_mediated_pmu;
>
> #if IS_ENABLED(CONFIG_HYPERV)
> spin_lock_init(&kvm->arch.hv_root_tdp_lock);
>
>