On Mon, Apr 15, 2024, Sean Christopherson wrote:
On Mon, Apr 15, 2024, Mingwei Zhang wrote:Agree. blindly clearing PMCs is the basic implementation. I am thinking
On Mon, Apr 15, 2024 at 3:04 AM Mi, Dapeng <dapeng1.mi@xxxxxxxxxxxxxxx> wrote:I'm not objecting to ensuring guest PMCs can't be read by any entity that's not
On 4/15/2024 2:06 PM, Mingwei Zhang wrote:I don't have a strong opinion on the upstream version. But since both
On Fri, Apr 12, 2024 at 9:25 PM Mi, Dapeng <dapeng1.mi@xxxxxxxxxxxxxxx> wrote:When context switch happens, perf code would schedule out the old events
I am curious how the perf subsystem solves the problem? Does perfNo, this place is clearing the guest counter value instead of hostIt's necessary to clear the EVENTSELx MSRs for both GP and fixed counters.Why is clearing for PMCx MSR unnecessary? Do we want to leaking counter
Considering this case, Guest uses GP counter 2, but Host doesn't use it. So
if the EVENTSEL2 MSR is not cleared here, the GP counter 2 would be enabled
unexpectedly on host later since Host perf always enable all validate bits
in PERF_GLOBAL_CTRL MSR. That would cause issues.
Yeah, the clearing for PMCx MSR should be unnecessary .
values to the host? NO. Not in cloud usage.
counter value. Host always has method to see guest value in a normal VM
if he want. I don't see its necessity, it's just a overkill and
introduce extra overhead to write MSRs.
subsystem in the host only scrubbing the selector but not the counter
value when doing the context switch?
and schedule in the new events. When scheduling out, the ENABLE bit of
EVENTSELx MSR would be cleared, and when scheduling in, the EVENTSELx
and PMCx MSRs would be overwritten with new event's attr.config and
sample_period separately. Of course, these is only for the case when
there are new events to be programmed on the PMC. If no new events, the
PMCx MSR would keep stall value and won't be cleared.
Anyway, I don't see any reason that PMCx MSR must be cleared.
the mediated vPMU and perf are clients of PMU HW, leaving PMC values
uncleared when transition out of the vPMU boundary is leaking info
technically.
in the guest's TCB, which is what I would consider a true leak. I'm objecting
to blindly clearing all PMCs, and more specifically objecting to *KVM* clearing
PMCs when saving guest state without coordinating with perf in any way.
about what coordination between perf and KVM as well.
I am ok if we start with (or default to) a "safe" implementation that zeroes allSure. Point taken.
PMCs when switching to host context, but I want KVM and perf to work together to
do the context switches, e.g. so that we don't end up with code where KVM writes
to all PMC MSRs and that perf also immediately writes to all PMC MSRs.
One my biggest complaints with the current vPMU code is that the roles andRight.
responsibilities between KVM and perf are poorly defined, which leads to suboptimal
and hard to maintain code.
Case in point, I'm pretty sure leaving guest values in PMCs _would_ leak guestah. This is a good point.
state to userspace processes that have RDPMC permissions, as the PMCs might not
be dirty from perf's perspective (see perf_clear_dirty_counters()).
switch_mm_irqs_off() =>
cr4_update_pce_mm() =>
/*
* Clear the existing dirty counters to
* prevent the leak for an RDPMC task.
*/
perf_clear_dirty_counters()
So perf does clear dirty counter values on process context switch. This
is nice to know.
perf_clear_dirty_counters() clear the counter values according to
cpuc->dirty except for those assigned counters.
Blindly clearing PMCs in KVM "solves" that problem, but in doing so makes theThere is a difference between KVM and perf subsystem on PMU context
overall code brittle because it's not clear whether KVM _needs_ to clear PMCs,
or if KVM is just being paranoid.
switch. The latter has the notion of "perf_events", while the former
currently does not. It is quite hard for KVM to know which counters are
really "in use".
Another point I want to raise up to you is that, KVM PMU context switch
and Perf PMU context switch happens at different timing:
- The former is a context switch between guest/host state of the same
process, happening at VM-enter/exit boundary.
- The latter is a context switch beteen two host-level processes.
- The former happens before the latter.
- Current design has no PMC partitioning between host/guest due to
arch limitation.
From the above, I feel that it might be impossible to combine them or to
add coordination? Unless we do the KVM PMU context switch at vcpu loop
boundary...
Thanks.
-Mingwei