Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling

From: Yang, Shunyong
Date: Sun Mar 11 2018 - 22:34:22 EST


Hi, Marc,

On Sun, 2018-03-11 at 12:17 +0000, Marc Zyngier wrote:
> On Sun, 11 Mar 2018 01:55:08 +0000
> Christoffer Dall <cdall@xxxxxxxxxx> wrote:
>
> >
> > On Sat, Mar 10, 2018 at 12:20 PM, Marc Zyngier <marc.zyngier@xxxxxx
> > m> wrote:
> > >
> > > On Fri, 09 Mar 2018 21:36:12 +0000,
> > > Christoffer Dall wrote:  
> > > >
> > > >
> > > > On Thu, Mar 08, 2018 at 05:28:44PM +0000, Marc Zyngier wrote:  
> > > > >
> > > > > I'd be more confident if we did forbid P+A for such
> > > > > interrupts
> > > > > altogether, as they really feel like another kind of HW
> > > > > interrupt.  
> > > > How about a slightly bigger hammer:  Can we avoid doing P+A for
> > > > level
> > > > interrupts completely?  I don't think that really makes much
> > > > sense, and
> > > > I think we simply everything if we just come back out and
> > > > resample the
> > > > line.  For an edge, something like a network card, there's a
> > > > potential
> > > > performance win to appending a new pending state, but I doubt
> > > > that this
> > > > is the case for level interrupts.  
> > > I started implementing the same thing yesterday. Somehow, it
> > > feels
> > > slightly better to have the same flow for all level interrupts,
> > > including the timer, and we only use the MI on EOI as a way to
> > > trigger
> > > the next state of injection. Still testing, but looking good so
> > > far.
> > >
> > > I'm still puzzled that we have this level-but-not-quite behaviour
> > > for
> > > VFIO interrupts. At some point, it is going to bite us badly.
> > >  
> > Where is the departure from level-triggered behavior with VFIO?  As
> > far as I can tell, the GIC flow of the interrupts will be just a
> > level
> > interrupt, 
> The GIC is fine, I believe. What is not exactly fine is the
> signalling
> from the device, which will never be dropped until the EOI has been
> detected.
>
> >
> > but we just need to make sure the resamplefd mechanism is
> > supported for both types of interrupts.  Whether or not that's a
> > decent mechanism seems orthogonal to me, but that's a discussion
> > for
> > another day I think.
> Given that VFIO is built around this mechanism, I don't think we have
> a
> choice but to support it. Anyway, I came up with the following patch,
> which I tested on Seattle with mtty. It also survived my usual
> hammering of cyclictest, hackbench  and bulk VM installs.
>
> Shunyong, could you please give it a go?
>
> Thanks,
>
> M.
>

I have tested the patch. It works on QDF2400 platform
and kvm_notify_acked_irq() is called when state is idle.

BTW, I have following questions when I was debugging the issue.
Coud you please give me some help?
1)what does "mi" mean in gic code? such as lr_signals_eoi_mi();
2)In some __hyp_text code where printk() will cause "HYP panic:", such
as in __kvm_vcpu_run(). How can I output debug information?

Thanks.
Shunyong.


> From 9ca96b9fb535cc6ab578bda85c4ecbc4a8c63cd7 Mon Sep 17 00:00:00
> 2001
> From: Marc Zyngier <marc.zyngier@xxxxxxx>
> Date: Fri, 9 Mar 2018 14:59:40 +0000
> Subject: [PATCH] KVM: arm/arm64: vgic: Disallow Active+Pending for
> level
>  interrupts
>
> It was recently reported that VFIO mediated devices, and anything
> that VFIO exposes as level interrupts, do no strictly follow the
> expected logic of such interrupts as it only lowers the input
> line when the guest has EOId the interrupt at the GIC level, rather
> than when it Acked the interrupt at the device level.
>
> The GIC's Active+Pending state is fundamentally incompatible with
> this behaviour, as it prevents KVM from observing the EOI, and in
> turn results in VFIO never dropping the line. This results in an
> interrupt storm in the guest, which it really never expected.
>
> As we cannot really change VFIO to follow the strict rules of level
> signalling, let's forbid the A+P state altogether, as it is in the
> end only an optimization. It ensures that we will transition via
> an invalid state, which we can use to notify VFIO of the EOI.
>
> Signed-off-by: Marc Zyngier <marc.zyngier@xxxxxxx>
> ---
>  virt/kvm/arm/vgic/vgic-v2.c | 47 +++++++++++++++++++++++++++------
> ------------
>  virt/kvm/arm/vgic/vgic-v3.c | 47 +++++++++++++++++++++++++++------
> ------------
>  2 files changed, 56 insertions(+), 38 deletions(-)
>
> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-
> v2.c
> index 29556f71b691..9356d749da1d 100644
> --- a/virt/kvm/arm/vgic/vgic-v2.c
> +++ b/virt/kvm/arm/vgic/vgic-v2.c
> @@ -153,8 +153,35 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu
> *vcpu)
>  void vgic_v2_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq
> *irq, int lr)
>  {
>   u32 val = irq->intid;
> + bool allow_pending = true;
>  
> - if (irq_is_pending(irq)) {
> + if (irq->active)
> + val |= GICH_LR_ACTIVE_BIT;
> +
> + if (irq->hw) {
> + val |= GICH_LR_HW;
> + val |= irq->hwintid << GICH_LR_PHYSID_CPUID_SHIFT;
> + /*
> +  * Never set pending+active on a HW interrupt, as
> the
> +  * pending state is kept at the physical distributor
> +  * level.
> +  */
> + if (irq->active)
> + allow_pending = false;
> + } else {
> + if (irq->config == VGIC_CONFIG_LEVEL) {
> + val |= GICH_LR_EOI;
> +
> + /*
> +  * Software resampling doesn't work very
> well
> +  * if we allow P+A, so let's not do that.
> +  */
> + if (irq->active)
> + allow_pending = false;
> + }
> + }
> +
> + if (allow_pending && irq_is_pending(irq)) {
>   val |= GICH_LR_PENDING_BIT;
>  
>   if (irq->config == VGIC_CONFIG_EDGE)
> @@ -171,24 +198,6 @@ void vgic_v2_populate_lr(struct kvm_vcpu *vcpu,
> struct vgic_irq *irq, int lr)
>   }
>   }
>  
> - if (irq->active)
> - val |= GICH_LR_ACTIVE_BIT;
> -
> - if (irq->hw) {
> - val |= GICH_LR_HW;
> - val |= irq->hwintid << GICH_LR_PHYSID_CPUID_SHIFT;
> - /*
> -  * Never set pending+active on a HW interrupt, as
> the
> -  * pending state is kept at the physical distributor
> -  * level.
> -  */
> - if (irq->active && irq_is_pending(irq))
> - val &= ~GICH_LR_PENDING_BIT;
> - } else {
> - if (irq->config == VGIC_CONFIG_LEVEL)
> - val |= GICH_LR_EOI;
> - }
> -
>   /*
>    * Level-triggered mapped IRQs are special because we only
> observe
>    * rising edges as input to the VGIC.  We therefore lower
> the line
> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-
> v3.c
> index 0ff2006f3781..6b484575cafb 100644
> --- a/virt/kvm/arm/vgic/vgic-v3.c
> +++ b/virt/kvm/arm/vgic/vgic-v3.c
> @@ -135,8 +135,35 @@ void vgic_v3_populate_lr(struct kvm_vcpu *vcpu,
> struct vgic_irq *irq, int lr)
>  {
>   u32 model = vcpu->kvm->arch.vgic.vgic_model;
>   u64 val = irq->intid;
> + bool allow_pending = true;
>  
> - if (irq_is_pending(irq)) {
> + if (irq->active)
> + val |= ICH_LR_ACTIVE_BIT;
> +
> + if (irq->hw) {
> + val |= ICH_LR_HW;
> + val |= ((u64)irq->hwintid) << ICH_LR_PHYS_ID_SHIFT;
> + /*
> +  * Never set pending+active on a HW interrupt, as
> the
> +  * pending state is kept at the physical distributor
> +  * level.
> +  */
> + if (irq->active)
> + allow_pending = false;
> + } else {
> + if (irq->config == VGIC_CONFIG_LEVEL) {
> + val |= ICH_LR_EOI;
> +
> + /*
> +  * Software resampling doesn't work very
> well
> +  * if we allow P+A, so let's not do that.
> +  */
> + if (irq->active)
> + allow_pending = false;
> + }
> + }
> +
> + if (allow_pending && irq_is_pending(irq)) {
>   val |= ICH_LR_PENDING_BIT;
>  
>   if (irq->config == VGIC_CONFIG_EDGE)
> @@ -154,24 +181,6 @@ void vgic_v3_populate_lr(struct kvm_vcpu *vcpu,
> struct vgic_irq *irq, int lr)
>   }
>   }
>  
> - if (irq->active)
> - val |= ICH_LR_ACTIVE_BIT;
> -
> - if (irq->hw) {
> - val |= ICH_LR_HW;
> - val |= ((u64)irq->hwintid) << ICH_LR_PHYS_ID_SHIFT;
> - /*
> -  * Never set pending+active on a HW interrupt, as
> the
> -  * pending state is kept at the physical distributor
> -  * level.
> -  */
> - if (irq->active && irq_is_pending(irq))
> - val &= ~ICH_LR_PENDING_BIT;
> - } else {
> - if (irq->config == VGIC_CONFIG_LEVEL)
> - val |= ICH_LR_EOI;
> - }
> -
>   /*
>    * Level-triggered mapped IRQs are special because we only
> observe
>    * rising edges as input to the VGIC.  We therefore lower
> the line
> -- 
> 2.14.2
>
>