RE: [PATCH] KVM: x86: Sync the pending Posted-Interrupts

From: Kang, Luwei
Date: Mon Jan 28 2019 - 03:08:57 EST


> > Some Posted-Interrupts from passthrough devices may be lost or
> > overwritten when the vCPU is in runnable state.
> >
> > The SN (Suppress Notification) of PID (Posted Interrupt Descriptor)
> > will be set when the vCPU is preempted (vCPU in KVM_MP_STATE_RUNNABLE
> > state but not running on physical CPU). If a posted interrupt coming
> > at this time, the irq remmaping facility will set the bit of PIR
> > (Posted Interrupt Requests) but ON (Outstanding Notification).
> > So this interrupt can't be sync to APIC virtualization register and
> > will not be handled by Guest because ON is zero.
> >
> > Signed-off-by: Luwei Kang <luwei.kang@xxxxxxxxx>
> > ---
> > arch/x86/kvm/vmx/vmx.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index
> > f6915f1..820a03b 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -6048,7 +6048,7 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
> > bool max_irr_updated;
> >
> > WARN_ON(!vcpu->arch.apicv_active);
> > - if (pi_test_on(&vmx->pi_desc)) {
> > + if (!bitmap_empty((unsigned long *)vmx->pi_desc.pir, NR_VECTORS)) {
> > pi_clear_on(&vmx->pi_desc);
> > /*
> > * IOMMU can write to PIR.ON, so the barrier matters even on UP.
> >
>
> This is a very delicate path. The bitmap check here is ordered after the vcpu->mode write in vcpu_enter_guest, matching the check of
> vcpu->mode in vmx_deliver_posted_interrupt (which comes after a write of
> PIR.ON):
>
> sender receiver
> write PIR
> write PIR.ON vcpu->mode = IN_GUEST_MODE
> smp_mb() smp_mb()
> read vcpu->mode sync_pir_to_irr
> read PIR.ON
>
> What you did should work, since PIR is written after PIR.ON anyway.

Hi Paolo:
I think what you mentioned PIR.ON here is PID.ON;
PID: Posted Interrupt Descriptor (a structure for PI which include 512 bits)
ON: Outstanding Notification (one bit of PID)
PIR: Posted Interrupt Requests (256 bits for each interrupt vector in PID)

Before VT-d irq remapping, there just have PIR and ON in PID. Posted interrupt introduced by APICv can be used for send IPI. The working flow of sent a posted interrupt show in vmx_deliver_posted_interrupt().
1. Set PIR of PID
2. Set ON of PID
3. Send a IPI to target vCPU with notification vector (POSTED_INTR_VECTOR) if target vCPU is Running on a pCPU; have no vm-exit and handed by guest interrupt handler directly.
4. if the target vCPU is not Running on pCPU, invoke kvm_vcpu_kick() to wake up this vCPU or send RESCHEDULE_VECTOR IPI to target pCPU to make the vCPU running as soon as possible.
5. follow 4. The vCPU prepare to run (vcpu_enter_guest) and sync the posted interrupt of ON is set.

It looks like work well.
VT-d irq remapping facility introduce SN, NV, NDST in PID. These are used by irq remapping facility and CPU donât care these flags.
6. The bit of SN will be set when vCPU id preempted (runnable but not running).
commit 28b835d60fcc2498e717cf5e6f0c3691c24546f7
KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
7. if a interrupt coming at this moment, irq remapping facility just set PIR but *not* set ON (VT-d spec 9.12).
So, here, the interrupt can't be sync to IRR because ON is 0.

I add some log here and found some interrupt recorded in PIR but ON is zero. It will impact the performance of pass through device.

> However, you should at least change the comment in vcpu_enter_guest to mention "before reading PIR" instead of "before reading
> PIR.ON".

Will do that. I think the "checking PIR.ON" should be PID.ON. I will fix it.

>
> Alternatively, would it be possible to instead set ON when SN is cleared? The clearing of SN is in pi_clear_sn, and you would have instead
> something like

SN is cleared when the corresponding vCPU is running on pCPU. I think we can't set ON when SN is cleared. Because there have some words in VT-d spec 9.12:
If ON is set at the time of hardware posting an interrupt to PIR field, notification event is not generated.

>
> WRITE_ONCE(u16 *)&pi_desc->on_sn, POSTED_INTR_ON);

We already have a function (pi_test_on) to check the bit of POSTED_INTR_ON. So I think it is unnecessary.

Thanks,
Luwei Kang

>
> where on_sn is added to struct pi_desc like this:
>
> @@ -61,4 +60,5 @@ struct pi_desc {
> u32 ndst;
> };
> + u16 on_sn;
> u64 control;
> };
>
> Paolo