Re: [PATCH v2 9/9] KVM: x86: never write to memory from kvm_vcpu_check_block

From: Sean Christopherson
Date: Tue Aug 16 2022 - 19:45:19 EST


On Thu, Aug 11, 2022, Paolo Bonzini wrote:
> kvm_vcpu_check_block() is called while not in TASK_RUNNING, and therefore
> it cannot sleep. Writing to guest memory is therefore forbidden, but it
> can happen on AMD processors if kvm_check_nested_events() causes a vmexit.
>
> Fortunately, all events that are caught by kvm_check_nested_events() are
> also recognized by kvm_vcpu_has_events() through vendor callbacks such as
> kvm_x86_interrupt_allowed() or kvm_x86_ops.nested_ops->has_events(), so
> remove the call and postpone the actual processing to vcpu_block().
>
> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> ---
> arch/x86/kvm/x86.c | 14 +++++++++++---
> 1 file changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 5e9358ea112b..9226fd536783 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10639,6 +10639,17 @@ static inline int vcpu_block(struct kvm_vcpu *vcpu)
> return 1;
> }
>
> + if (is_guest_mode(vcpu)) {
> + /*
> + * Evaluate nested events before exiting the halted state.
> + * This allows the halt state to be recorded properly in
> + * the VMCS12's activity state field (AMD does not have
> + * a similar field and a vmexit always causes a spurious
> + * wakeup from HLT).
> + */
> + kvm_check_nested_events(vcpu);

Formatting nit, I'd prefer the block comment go above the if-statement, that way
we avoiding debating whether or not the technically-unnecessary braces align with
kernel/KVM style, and it doesn't have to wrap as aggressively.

And s/vmexit/VM-Exit while I'm nitpicking.

/*
* Evaluate nested events before exiting the halted state. This allows
* the halt state to be recorded properly in the VMCS12's activity
* state field (AMD does not have a similar field and a VM-Exit always
* causes a spurious wakeup from HLT).
*/
if (is_guest_mode(vcpu))
kvm_check_nested_events(vcpu);

Side topic, the AMD behavior is a bug report waiting to happen. I know of at least
one customer failure that was root caused to a KVM bug where KVM caused a spurious
wakeup. To be fair, the guest workload was being stupid (execute HLT on vCPU and
then effectively unmap its code by doing kexec), but it's still an unpleasant gap :-(