RE: [RFC PATCH v2 3/3] KVM: VMX: Stop/resume host PT before/after VMX transition when PT_MODE_HOST_GUEST
From: Wang, Wei W
Date: Thu Sep 22 2022 - 08:35:24 EST
On Thursday, September 22, 2022 12:45 AM, Li, Xiaoyao wrote:
> Current implementation in pt_guest_enter() has two issues when pt mode is
> PT_MODE_HOST_GUEST.
>
> 1. It relies on VM_ENTRY_LOAD_IA32_RTIT_CTL to disable host's Intel PT
> for the case that host enables PT while guest not.
>
> However, it causes VM entry failure due to violating the requirement
> stated in SDM "VM-Execution Control Fields"
>
> If the logical processor is operating with Intel PT enabled (if
> IA32_RTIT_CTL.TraceEn = 1) at the time of VM entry, the "load
> IA32_RTIT_CTL" VM-entry control must be 0.
>
> 2. In the case both host and guest enable Intel PT, it disables host's
> Intel PT by manually clearing MSR_IA32_RTIT_CTL for the purpose to
> context switch host and guest's PT configurations.
>
> However, PT PMI can be delivered later and before VM entry. In the PT
> PMI handler, it will a) update the host PT MSRs which leads to what KVM
> stores in vmx->pt_desc.host becomes stale, and b) re-enable Intel PT
> which leads to VM entry failure as #1.
>
> To fix the above two issues, 1) grab and store host PT perf event and
> disable/enable host PT before vm-enter/ after vm-exit. 2) drop host pt_ctx and
> the logic to save/restore host PT MSRs since host PT driver doesn't rely on the
> previous value of PT MSR, i.e., the re-enabling of PT event after VM-exit
> re-initializes all the PT MSRs that it cares.
I would also re-write the commit:
Current KVM implementation directly modifies the hardware states when it
wants to stop or start a host PT event on VMX transitions. This is not proper
because:
1) host perf event is well managed by the perf subsystem. Getting it stopped/started
needs to go through the perf subsystem to update the related metadata (e.g. event
state).
2) Simply modifying the MSR_IA32_RTIT_CTL isn't a complete way to disable the
host PT event, as there may be special cases, for example, a dangling PT bit in the
interrupt status register after PT has been stopped may cause the PT interrupt handler
to enable PT while KVM assumes PT has been disabled after it clears MSR_IA32_RTIT_CTL
(for more details, please check SDM "VM-Execution Control Fields" section for PT related
requirements). Not properly handling such cases can result in VMEntry failures. Those have
already been properly handled by the PT driver (i.e. pt_event_stop) called from the perf core.
3) stop/start a host PT event needs to save/restore the related MSR states for an
event switching. This has also been properly supported by the PT driver. It is an extra
burden for KVM to maintain another version of function to do such save/restore.
For those reasons, change KVM to get the running host PT event from the PT driver,
and call perf_event_disable/enable_local to disable/enable the host PT event running on the CPU.
This will reuse perf and PT driver to switch in/out the host PT event, with proper management of the
perf event.
>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@xxxxxxxxx>
> ---
> arch/x86/kvm/vmx/vmx.c | 31 +++++++++++++------------------
> arch/x86/kvm/vmx/vmx.h | 2 +-
> 2 files changed, 14 insertions(+), 19 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index
> c9b49a09e6b5..df1a16264bb6 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1124,37 +1124,32 @@ static inline void pt_save_msr(struct pt_ctx *ctx,
> u32 addr_range)
>
> static void pt_guest_enter(struct vcpu_vmx *vmx) {
> + struct perf_event *event;
> +
> if (vmx_pt_mode_is_system())
> return;
>
> - /*
> - * GUEST_IA32_RTIT_CTL is already set in the VMCS.
> - * Save host state before VM entry.
> - */
> - rdmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc.host.ctl);
> - if (vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) {
> - wrmsrl(MSR_IA32_RTIT_CTL, 0);
> - pt_save_msr(&vmx->pt_desc.host,
> vmx->pt_desc.num_address_ranges);
> + event = pt_get_curr_event();
> + if (event)
> + perf_event_disable_local(event);
> + vmx->pt_desc.host_event = event;
> +
> + if (vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN)
> pt_load_msr(&vmx->pt_desc.guest,
> vmx->pt_desc.num_address_ranges);
> - }
> }
>
> static void pt_guest_exit(struct vcpu_vmx *vmx) {
> + struct perf_event *event = vmx->pt_desc.host_event;
> +
> if (vmx_pt_mode_is_system())
> return;
>
> - if (vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) {
> + if (vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN)
> pt_save_msr(&vmx->pt_desc.guest,
> vmx->pt_desc.num_address_ranges);
As they are only used for guest msrs now, probably we can
rename them to pt_save_guest_msrs. Also for the load side.