Re: [PATCH] KVM: x86: Sync the pending Posted-Interrupts

From: Paolo Bonzini
Date: Fri Jan 25 2019 - 13:28:09 EST


On 18/01/19 07:34, Luwei Kang wrote:
> Some Posted-Interrupts from passthrough devices may be lost or
> overwritten when the vCPU is in runnable state.
>
> The SN (Suppress Notification) of PID (Posted Interrupt Descriptor) will
> be set when the vCPU is preempted (vCPU in KVM_MP_STATE_RUNNABLE state
> but not running on physical CPU). If a posted interrupt coming at this
> time, the irq remmaping facility will set the bit of PIR (Posted
> Interrupt Requests) but ON (Outstanding Notification).
> So this interrupt can't be sync to APIC virtualization register and
> will not be handled by Guest because ON is zero.
>
> Signed-off-by: Luwei Kang <luwei.kang@xxxxxxxxx>
> ---
> arch/x86/kvm/vmx/vmx.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index f6915f1..820a03b 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -6048,7 +6048,7 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
> bool max_irr_updated;
>
> WARN_ON(!vcpu->arch.apicv_active);
> - if (pi_test_on(&vmx->pi_desc)) {
> + if (!bitmap_empty((unsigned long *)vmx->pi_desc.pir, NR_VECTORS)) {
> pi_clear_on(&vmx->pi_desc);
> /*
> * IOMMU can write to PIR.ON, so the barrier matters even on UP.
>

This is a very delicate path. The bitmap check here is ordered after
the vcpu->mode write in vcpu_enter_guest, matching the check of
vcpu->mode in vmx_deliver_posted_interrupt (which comes after a write of
PIR.ON):

sender receiver
write PIR
write PIR.ON vcpu->mode = IN_GUEST_MODE
smp_mb() smp_mb()
read vcpu->mode sync_pir_to_irr
read PIR.ON

What you did should work, since PIR is written after PIR.ON anyway.
However, you should at least change the comment in vcpu_enter_guest to
mention "before reading PIR" instead of "before reading PIR.ON".

Alternatively, would it be possible to instead set ON when SN is
cleared? The clearing of SN is in pi_clear_sn, and you would have
instead something like

WRITE_ONCE(u16 *)&pi_desc->on_sn, POSTED_INTR_ON);

where on_sn is added to struct pi_desc like this:

@@ -61,4 +60,5 @@ struct pi_desc {
u32 ndst;
};
+ u16 on_sn;
u64 control;
};

Paolo