Re: [RFC PATCH 06/22] KVM: x86: INIT may transition from HALTED to RUNNABLE

From: Sean Christopherson
Date: Tue Dec 03 2024 - 14:08:04 EST


The shortlog is an observation, not a proper summary of the change.

KVM: x86: Handle side effects of receiving INIT while vCPU is HALTED

On Thu, Nov 21, 2024, Mingwei Zhang wrote:
> From: Jim Mattson <jmattson@xxxxxxxxxx>
>
> When a halted vCPU is awakened by an INIT signal, it might have been
> the target of a previous KVM_HC_KICK_CPU hypercall, in which case
> pv_unhalted would be set. This flag should be cleared before the next
> HLT instruction, as kvm_vcpu_has_events() would otherwise return true
> and prevent the vCPU from entering the halt state.
>
> Use kvm_vcpu_make_runnable() to ensure consistent handling of the
> HALTED to RUNNABLE state transition.
>
> Fixes: 6aef266c6e17 ("kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks")
> Signed-off-by: Jim Mattson <jmattson@xxxxxxxxxx>

Mingwei's SoB is missing.

> ---
> arch/x86/kvm/lapic.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 95c6beb8ce279..97aa634505306 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -3372,9 +3372,8 @@ int kvm_apic_accept_events(struct kvm_vcpu *vcpu)
>
> if (test_and_clear_bit(KVM_APIC_INIT, &apic->pending_events)) {
> kvm_vcpu_reset(vcpu, true);
> - if (kvm_vcpu_is_bsp(apic->vcpu))
> - vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> - else
> + kvm_vcpu_make_runnable(vcpu);

This is arguably wrong. APs are never runnable after receiving. Nothing should
ever be able to observe the "bad" state, but that doesn't make it any less
confusing.

This series also fails to address the majority cases where KVM transitions to RUNNABLE:

__set_sregs_common()
__sev_snp_update_protected_guest_state()
kvm_arch_vcpu_ioctl_set_mpstate()
kvm_xen_schedop_poll()
kvm_arch_async_page_present()
kvm_arch_vcpu_ioctl_get_mpstate()
kvm_apic_accept_events() (SIPI path)

Yeah, some of those don't _need_ to be converted, and the existing behavior of
pv_unhalted is all kinds of sketchy, but fixing a few select paths just so that
APERF/MPERF virtualization does what y'all want it to do does not leave KVM in a
better place.

I also think we should add a generic setter, e.g. kvm_set_mp_state(), and take
this opportunity to sanitize pv_unhalted. Specifically, I think pv_unhalted
should be clear on _any_ state transition, and unconditionally cleared when KVM
enters the guest. The PV kick should only wake a vCPU that is currently halted.
Unfortunately, the cross-vCPU nature means KVM can't easily handle that when
delivering APIC_DM_REMRD.

Please also send these fixes as a separate series. My crystal ball says APERF/MPERF
virtualization isn't going to land in the near future, and I would like to get
the mp_state handling cleaned up soonish.

> + if (!kvm_vcpu_is_bsp(apic->vcpu))
> vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
> }
> if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events)) {
> --
> 2.47.0.371.ga323438b13-goog
>