On Wed, Aug 09, 2023 at 08:25:07AM +0000, Oliver Upton wrote:Yes.
Hi Huang,Yep; IIUC the issue here is that when we take an IRQ from a guest and reprogram
On Wed, Aug 09, 2023 at 09:39:53AM +0800, Huang Shijie wrote:
2.) Root cause.This is a rather vague description of the problem. AFAICT, the
There is only 7 counters in my arm64 platform:
(one cycle counter) + (6 normal counters)
In 1.3 above, we will use 10 event counters.
Since we only have 7 counters, the perf core will trigger
event multiplexing in hrtimer:
merge_sched_in() -->perf_mux_hrtimer_restart() -->
perf_rotate_context().
In the perf_rotate_context(), it does not restore some PMU registers
as context_switch() does. In context_switch():
kvm_sched_in() --> kvm_vcpu_pmu_restore_guest()
kvm_sched_out() --> kvm_vcpu_pmu_restore_host()
So we got wrong result.
issue here is on VHE systems we wind up getting the EL0 count
enable/disable bits backwards when entering the guest, which is
corroborated by the data you have below.
the PMU in the IRQ handler, the IRQ handler will program the PMU with
appropriate host/guest/user/etc filters for a *host* context, and then we'll
return back into the guest without reconfigurign the event filtering for a
*guest* context.
That can happen for perf_rotate_context(), or when we install an event into a
running context, as that'll happen via an IPI.
The latter sounds reasonable to me.+void arch_perf_rotate_pmu_set(void)This sort of hook is rather nasty, and I'd strongly prefer a solution
+{
+ if (is_guest())
+ kvm_vcpu_pmu_restore_guest(NULL);
+ else
+ kvm_vcpu_pmu_restore_host(NULL);
+}
+
that's confined to KVM. I don't think the !is_guest() branch is
necessary at all. Regardless of how the pmu context is changed, we need
to go through vcpu_put() before getting back out to userspace.
We can check for a running vCPU (ick) from kvm_set_pmu_events() and either
do the EL0 bit flip there or make a request on the vCPU to call
kvm_vcpu_pmu_restore_guest() immediately before reentering the guest.
I'm slightly leaning towards the latter, unless anyone has a better idea
here.
I suspect we need to take special care here to make sure we leave *all* events
in a good state when re-entering the guest or if we get to kvm_sched_out()
after *removing* an event via an IPI -- it'd be easy to mess either case up and
leave some events in a bad state.
Thanks,
Mark.