Re: [PATCH v5 3/3] KVM: x86: add new nested vmexit tracepointsg
From: Sean Christopherson
Date: Wed Dec 18 2024 - 16:14:35 EST
On Tue, Sep 10, 2024, Maxim Levitsky wrote:
> Add 3 new tracepoints for nested VM exits which are intended
> to capture extra information to gain insights about the nested guest
> behavior.
>
> The new tracepoints are:
>
> - kvm_nested_msr
> - kvm_nested_hypercall
I 100% agree that not having register state in the exit tracepoints is obnoxious,
but I don't think we should add one-off tracepoints for the most annoying cases.
I would much prefer to figure out a way to capture register state in kvm_entry
and kvm_exit. E.g. I've lost track of the number of times I've observed an MSR
exit without having trace_kvm_msr enabled.
One idea would be to capture E{A,B,C,D}X, which would cover MSRs, CPUID, and
most hypercalls. And then we might even be able to drop the dedicated MSR and
CPUID tracepoints (not sure if that's a good idea).
Side topic, arch/s390/kvm/trace.h has the concept of COMMON information that is
captured for multiple tracepoints. I haven't looked closely, but I gotta imagine
we can/should use a similar approach for x86.
> These tracepoints capture extra register state to be able to know
> which MSR or which hypercall was done.
>
> - kvm_nested_page_fault
>
> This tracepoint allows to capture extra info about which host pagefault
> error code caused the nested page fault.
The host error code, a.k.a. qualification info, is readily available in the
kvm_exit (or nested variant) tracepoint. I don't letting userspace skip a
tracepoint that's probably already enabled is worth the extra code to support
this tracepoint. The nested_svm_inject_npf_exit() code in particular is wonky,
and I think it's a good example of why userspace "needs" trace_kvm_exit, e.g. to
observe that a nested stage-2 page fault didn't originate from a hardware stage-2
fault.