RE: [RFC PATCH 0/2] KVM: VMX: Fix VM entry failure on PT_MODE_HOST_GUEST while host is using PT

From: Wang, Wei W
Date: Thu Sep 15 2022 - 22:31:03 EST


On Thursday, September 15, 2022 11:43 PM, Liang, Kan wrote:
> On 2022-09-15 10:39 a.m., Wang, Wei W wrote:
> > On Thursday, September 15, 2022 9:55 PM Liang, Kan wrote:
> >> On 2022-09-14 10:46 p.m., Wang, Wei W wrote:
> >>> On Thursday, September 15, 2022 4:26 AM, Liang, Kan wrote:
> >>>> The perf_event_disable() eventually invokes the intel_pt_stop().
> >>>> We already expose the intel_pt_stop()/cpu_emergency_stop_pt() to
> >>>> other modules. I don't think we have to use the perf_event_disable().
> >>>> Also, the
> >>>> perf_event_disable() requires extra codes.
> >>>>
> >>>> I went through the discussions. I agree with Sean's suggestion.
> >>>> We should only put the logic in the KVM but all the MSR access
> >>>> details into the PT driver.
> >>>
> >>> Even the driver itself doesn’t drive the save/restore of the MSRs,
> >>> it is drived
> >> by perf.
> >>
> >> It through perf_event, not driven by perf_event. The perf_event
> >> generic code never knows when should invokes each driver to
> >> save/restore information. It should be driven by the other subsystem e.g.,
> scheduler.
> >
> > Yes. The cpu scheduler does this via the perf subsystem, though.
> >
> >>
> >> For this case, KVM should drive the save/restore, and the PT driver
> >> eventually does all the MSR access details.
> >>
> >>> 1. If we make KVM a user of perf, we should do this via
> >> perf_event_disable/enable_*.
> >>> 2. If we make KVM an alternative to perf (i.e. have direct control
> >>> over PMU HW), we can do this via driver interfaces like perf.
> >>> Per my experience, we should go for 1. Probably need Peter's
> >>> opinions on
> >> this.
> >>>
> >>
> >> For 1, the perf_event_disable/enable_* are not enough. They don't
> >> save/restore MSRs.
> >
> > perf_event_disable will go through perf to call pt_event_stop which saves
> the related MSRs, right?
>
> I don't think so. The pt_event_stop() doesn't save all the
> MSR_IA32_RTIT_* MSRs.

Not all the MSRs are required to be saved. In general, pt_event_stop should have
saved all the MSRs required for an event switching. Otherwise the general usages
of PT have been broken. To be more precise, the following MSRs are not saved by
pt_event_stop, but I don’t see they are required to be saved:

- MSR_IA32_RTIT_CR3_MATCH: I don’t see it is used by perf.
Seems like KVM saved an MSR that's not even used by the host.

- Address range MSRs (MSR_IA32_RTIT_ADDR0_A etc.): Those are provided by s/w and not updated by h/w.
So they're just set to MSRs when event gets scheduled in. There is no need to save.

>
> > (if so, what large changes did you mean?)
> >
> >> If we go to this way, we have to introduce a new generic interface to
> >> ask each driver to save/restore their MSRs when the guest is
> >> entering/exiting. We'd better combine the new interface with the
> >> existing
> >> perf_guest_get_msrs() of the core driver.
> >> I think that's an ideal solution, but requires big changes in the code.
> >>
> >> 2 is the current KVM implementation. See pt_save_msr()/pt_load_msr().
> >> I don't think it's a right way. We'd better fix it.
> >>
> >> The suggestion should be 3. The KVM notify the PT driver via the
> >> interface provided by PT. The PT driver save/restore all the registers.
> >> I think it's an acceptable solution with small code changes.
> >
> > This looks like we just relocate the save/restore functions to the PT driver
> and KVM still directly call them - still not going through perf's management.
> Imagine every user operates on the pmu h/w directly like this, things would be
> a mess.
> >
>
>
> The pt_event_stop() and the proposed interface still manipulate the PT event
> pt->handle.event. The event status is updated as well. It's still under control of
> the perf_event.

Did you mean to handle the PT event in the proposed driver API? Event status is just
one of the things. There are other things if we want to make it complete for this,
e.g. event->oncpu = -1, and eventually seems we will re-implement perf_event_disable_*.

Btw, Xiaoyao has made it work with perf_event_disable_local, and don’t have that many changes.
If necessary, we can post the 2nd version out to double check.

Thanks,
Wei