Re: [PATCH] kvm/x86: Fix PT "host mode"

From: Sean Christopherson
Date: Mon Aug 23 2021 - 12:17:01 EST


On Mon, Aug 23, 2021, Alexander Shishkin wrote:
> Regardless of the "pt_mode", the kvm driver installs its interrupt handler
> for Intel PT, which always overrides the native handler, causing data loss
> inside kvm guests, while we're expecting to trace them.
>
> Fix this by only installing kvm's perf_guest_cbs if pt_mode is set to
> guest tracing.

Uh, regardless of the correctness of such a change (spoiler alert), making an
enormous leap from "one thing is wrong" to "nuke it all!" needs way more
justfication/explanation. Or more realistically, such a leap should be a good
indication that the proposed change is not correct.

> Signed-off-by: Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx>
> Fixes: ff9d07a0e7ce7 ("KVM: Implement perf callbacks for guest sampling")

This should be another clue that the fix isn't correct. That patch is from 2010,
Intel PT was announced in 2013 and merged in 2019.

> Reported-by: Artem Kashkanov <artem.kashkanov@xxxxxxxxx>
> Tested-by: Artem Kashkanov <artem.kashkanov@xxxxxxxxx>
> ---
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/vmx/vmx.c | 6 ++++++
> arch/x86/kvm/x86.c | 10 ++++++++--
> 3 files changed, 15 insertions(+), 2 deletions(-)
>

...

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9b6bca616929..3ba0001e7388 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -268,6 +268,8 @@ static struct kmem_cache *x86_fpu_cache;
>
> static struct kmem_cache *x86_emulator_cache;
>
> +static int __read_mostly intel_pt_enabled;
> +
> /*
> * When called, it means the previous get/set msr reached an invalid msr.
> * Return true if we want to ignore/silent this failed msr access.
> @@ -8194,7 +8196,10 @@ int kvm_arch_init(void *opaque)
>
> kvm_timer_init();
>
> - perf_register_guest_info_callbacks(&kvm_guest_cbs);
> + if (ops->intel_pt_enabled && ops->intel_pt_enabled()) r

This is not remotely correct. vmx.c's "pt_mode", which is queried via this path,
is modified by hardware_setup(), a.k.a. kvm_x86_ops.hardware_setup(), which runs
_after_ this code. And as alluded to above, these are generic perf callbacks,
installing them if and only if Intel PT is enabled in a specific mode completely
breaks "regular" perf.

I'll post a small series, there's a bit of code massage needed to fix this
properly. The PMI handler can also be optimized to avoid a retpoline when PT is
not exposed to the guest.

> + perf_register_guest_info_callbacks(&kvm_guest_cbs);
> + intel_pt_enabled = 1;
> + }
>
> if (boot_cpu_has(X86_FEATURE_XSAVE)) {
> host_xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
> @@ -8229,7 +8234,8 @@ void kvm_arch_exit(void)
> clear_hv_tscchange_cb();
> #endif
> kvm_lapic_exit();
> - perf_unregister_guest_info_callbacks(&kvm_guest_cbs);
> + if (intel_pt_enabled)
> + perf_unregister_guest_info_callbacks(&kvm_guest_cbs);
>
> if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
> cpufreq_unregister_notifier(&kvmclock_cpufreq_notifier_block,
> --
> 2.32.0
>