Re: [PATCH v2 32/43] KVM: VMX: Move preemption timer <=> hrtimer dance to common x86

From: Maxim Levitsky
Date: Thu Oct 28 2021 - 11:46:14 EST


On Fri, 2021-10-08 at 19:12 -0700, Sean Christopherson wrote:
> Handle the switch to/from the hypervisor/software timer when a vCPU is
> blocking in common x86 instead of in VMX. Even though VMX is the only
> user of a hypervisor timer, the logic and all functions involved are
> generic x86 (unless future CPUs do something completely different and
> implement a hypervisor timer that runs regardless of mode).
>
> Handling the switch in common x86 will allow for the elimination of the
> pre/post_blocks hooks, and also lets KVM switch back to the hypervisor
> timer if and only if it was in use (without additional params). Add a
> comment explaining why the switch cannot be deferred to kvm_sched_out()
> or kvm_vcpu_block().
>
> Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> ---
> arch/x86/kvm/vmx/vmx.c | 6 +-----
> arch/x86/kvm/x86.c | 21 +++++++++++++++++++++
> 2 files changed, 22 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index b3bb2031a7ac..a24f19874716 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7464,16 +7464,12 @@ void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu)
>
> static int vmx_pre_block(struct kvm_vcpu *vcpu)
> {
> - if (kvm_lapic_hv_timer_in_use(vcpu))
> - kvm_lapic_switch_to_sw_timer(vcpu);
> -
> return 0;
> }
>
> static void vmx_post_block(struct kvm_vcpu *vcpu)
> {
> - if (kvm_x86_ops.set_hv_timer)
> - kvm_lapic_switch_to_hv_timer(vcpu);
> +
> }
>
> static void vmx_setup_mce(struct kvm_vcpu *vcpu)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e0219acfd9cf..909e932a7ae7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9896,8 +9896,21 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>
> static inline int vcpu_block(struct kvm *kvm, struct kvm_vcpu *vcpu)
> {
> + bool hv_timer;
> +
> if (!kvm_arch_vcpu_runnable(vcpu) &&
> (!kvm_x86_ops.pre_block || static_call(kvm_x86_pre_block)(vcpu) == 0)) {
> + /*
> + * Switch to the software timer before halt-polling/blocking as
> + * the guest's timer may be a break event for the vCPU, and the
> + * hypervisor timer runs only when the CPU is in guest mode.
> + * Switch before halt-polling so that KVM recognizes an expired
> + * timer before blocking.
> + */

I didn't knew about this until now but it all makes sense. The comment is very good.

> + hv_timer = kvm_lapic_hv_timer_in_use(vcpu);
> + if (hv_timer)
> + kvm_lapic_switch_to_sw_timer(vcpu);
> +
> srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
> if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED)
> kvm_vcpu_halt(vcpu);
> @@ -9905,6 +9918,9 @@ static inline int vcpu_block(struct kvm *kvm, struct kvm_vcpu *vcpu)
> kvm_vcpu_block(vcpu);
> vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
>
> + if (hv_timer)
> + kvm_lapic_switch_to_hv_timer(vcpu);
> +
> if (kvm_x86_ops.post_block)
> static_call(kvm_x86_post_block)(vcpu);
>
> @@ -10136,6 +10152,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> r = -EINTR;
> goto out;
> }
> + /*
> + * It should be impossible for the hypervisor timer to be in
> + * use before KVM has ever run the vCPU.
> + */
> + WARN_ON_ONCE(kvm_lapic_hv_timer_in_use(vcpu));
> kvm_vcpu_block(vcpu);
> if (kvm_apic_accept_events(vcpu) < 0) {
> r = 0;

Reviewed-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx>

Best regards,
Maxim Levitsky