Re: [RFC PATCH 23/41] KVM: x86/pmu: Implement the save/restore of PMU state for Intel CPU

From: Mi, Dapeng
Date: Wed Apr 24 2024 - 23:56:18 EST



On 4/24/2024 11:00 PM, Sean Christopherson wrote:
On Wed, Apr 24, 2024, Dapeng Mi wrote:
On 4/24/2024 1:02 AM, Mingwei Zhang wrote:
Maybe, (just maybe), it is possible to do PMU context switch at vcpu
boundary normally, but doing it at VM Enter/Exit boundary when host is
profiling KVM kernel module. So, dynamically adjusting PMU context
switch location could be an option.
If there are two VMs with pmu enabled both, however host PMU is not
enabled. PMU context switch should be done in vcpu thread sched-out path.

If host pmu is used also, we can choose whether PMU switch should be
done in vm exit path or vcpu thread sched-out path.

host PMU is always enabled, ie., Linux currently does not support KVM
PMU running standalone. I guess what you mean is there are no active
perf_events on the host side. Allowing a PMU context switch drifting
from vm-enter/exit boundary to vcpu loop boundary by checking host
side events might be a good option. We can keep the discussion, but I
won't propose that in v2.
I suspect if it's really doable to do this deferring. This still makes host
lose the most of capability to profile KVM. Per my understanding, most of
KVM overhead happens in the vcpu loop, exactly speaking in VM-exit handling.
We have no idea when host want to create perf event to profile KVM, it could
be at any time.
No, the idea is that KVM will load host PMU state asap, but only when host PMU
state actually needs to be loaded, i.e. only when there are relevant host events.

If there are no host perf events, KVM keeps guest PMU state loaded for the entire
KVM_RUN loop, i.e. provides optimal behavior for the guest. But if a host perf
events exists (or comes along), the KVM context switches PMU at VM-Enter/VM-Exit,
i.e. lets the host profile almost all of KVM, at the cost of a degraded experience
for the guest while host perf events are active.

I see. So KVM needs to provide a callback which needs to be called in the IPI handler. The KVM callback needs to be called to switch PMU state before perf really enabling host event and touching PMU MSRs. And only the perf event with exclude_guest attribute is allowed to create on host. Thanks.



My original sketch: https://lore.kernel.org/all/ZR3eNtP5IVAHeFNC@xxxxxxxxxx