RE: [PATCH 3/4] KVM: VMX/pmu: Enable inactive vLBR event in guest LBR MSR emulation
From: Zhang, Xiong Y
Date: Sun Jun 25 2023 - 02:54:51 EST
> > On Fri, Jun 16, 2023, Xiong Zhang wrote:
> > vLBR event could be inactive in two case:
> > a. host per cpu pinned LBR event occupy LBR when vLBR event is created
> > b. vLBR event is preempted by host per cpu pinned LBR event during vm
> > exit handler.
> > When vLBR event is inactive, guest couldn't access LBR msr, and it is
> > forced into error state and is excluded from schedule by perf scheduler.
> > So vLBR event couldn't be active through perf scheduler even if host
> > per cpu pinned LBR event has released LBR, kvm could enable vLBR event
> > proactively, then vLBR event may be active and LBR msr could be
> > passthrough into guest.
> >
> > Signed-off-by: Xiong Zhang <xiong.y.zhang@xxxxxxxxx>
> > ---
> > arch/x86/kvm/vmx/pmu_intel.c | 11 ++++++++++-
> > 1 file changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/vmx/pmu_intel.c
> > b/arch/x86/kvm/vmx/pmu_intel.c index 741efe2c497b..5a3ab8c8711b 100644
> > --- a/arch/x86/kvm/vmx/pmu_intel.c
> > +++ b/arch/x86/kvm/vmx/pmu_intel.c
> > @@ -314,7 +314,16 @@ static bool intel_pmu_handle_lbr_msrs_access(struct
> kvm_vcpu *vcpu,
> > if (!intel_pmu_is_valid_lbr_msr(vcpu, index))
> > return false;
> >
> > - if (!lbr_desc->event && intel_pmu_create_guest_lbr_event(vcpu) < 0)
> > + /* vLBR event may be inactive, but physical LBR may be free now.
>
> /*
> * This is the preferred block comment style.
> */
>
> > + * but vLBR event is pinned event, once it is inactive state, perf
> > + * will force it to error state in merge_sched_in() and exclude it from
> > + * perf schedule, so even if LBR is free now, vLBR event couldn't be
> active
> > + * through perf scheduler and vLBR event could be active through
> > + * perf_event_enable().
> > + */
>
> Trimming that down, is this what you mean?
Yes, thanks a lot.
>
> /*
> * Attempt to re-enable the vLBR event if it was disabled due to
> * contention with host LBR usage, i.e. was put into an error state.
> * Perf doesn't notify KVM if the host stops using LBRs, i.e. KVM needs
> * to manually re-enable the event.
> */
>
> Which begs the question, why can't there be a notification of some form that
> the LBRs are once again available?
This is perf scheduler rule. If pinned event couldn't get resource as resource limitation, perf will put it into error state and exclude it from perf scheduler, even if resource available later, perf won't schedule it again as it is in error state, the only way to reschedule it is to enable it again.
If non-pinned event couldn't get resource as resource limitation, perf will put it into inactive state, perf will reschedule it automatically once resource is available.
vLBR event is per process pinned event.
>
> Assuming that's too difficult for whatever reason, why wait until the guest tries
> to read LBRs? E.g. why not be more aggressive and try to re-enable vLBRs on
> every VM-Exit.
Yes, it is a good suggestion. Actually vmx_passthrough_lbr_msrs() is called on every
VM-exit, it also check vLBR event state, so I could re-enable vLBR in this function.
>
> And if we do wait until the guest touches relevant MSRs, shouldn't writes to
> DEBUG_CTL that set DEBUGCTLMSR_LBR also try to re-enable the event?
Only perf know whether vLBR event could be active or not at this moment, if vLBR is active, KVM could read/write DEBUG_CTL[0] with irq disable/enable pair in theory, but it is better that kvm don't touch perf hw resource directly, as vLBR event is one host LBR event, host may have other LBR event, perf will schedule them according to perf scheduler rule. If vLBR is inactive, KVM shouldn't touch DEBUG_CTL MSR totally.
>
> Lastly, what guarantees that the MSRs hold guest data? I assume perf purges
> the MSRs at some point, but it would be helpful to call that out in the changelog.
For DEBUG_CTL msr, VMCS has two fields for this:
1. "Save debug controls" in VM-Exit controls
2. "Load debug controls" in VM-Entry controls
For LBR records MSRs, perf will save them at process schedule out and load them at process schedule in.
Sure, I will add it into changelog.
> > + if (lbr_desc->event && (lbr_desc->event->state ==
> PERF_EVENT_STATE_ERROR))
> > + perf_event_enable(lbr_desc->event);
> > + else if (!lbr_desc->event && intel_pmu_create_guest_lbr_event(vcpu)
> > +< 0)
> > goto dummy;
> >
> > /*
> > --
> > 2.25.1
> >