Re: [PATCH 0/2] powerpc/kvm: Enable running guests on RT Linux
From: Scott Wood
Date: Thu Apr 23 2015 - 17:27:17 EST
On Thu, 2015-04-23 at 15:31 +0300, Purcareata Bogdan wrote:
> On 23.04.2015 03:30, Scott Wood wrote:
> > On Wed, 2015-04-22 at 15:06 +0300, Purcareata Bogdan wrote:
> >> On 21.04.2015 03:52, Scott Wood wrote:
> >>> On Mon, 2015-04-20 at 13:53 +0300, Purcareata Bogdan wrote:
> >>>> There was a weird situation for .kvmppc_mpic_set_epr - its corresponding inner
> >>>> function is kvmppc_set_epr, which is a static inline. Removing the static inline
> >>>> yields a compiler crash (Segmentation fault (core dumped) -
> >>>> scripts/Makefile.build:441: recipe for target 'arch/powerpc/kvm/kvm.o' failed),
> >>>> but that's a different story, so I just let it be for now. Point is the time may
> >>>> include other work after the lock has been released, but before the function
> >>>> actually returned. I noticed this was the case for .kvm_set_msi, which could
> >>>> work up to 90 ms, not actually under the lock. This made me change what I'm
> >>>> looking at.
> >>>
> >>> kvm_set_msi does pretty much nothing outside the lock -- I suspect
> >>> you're measuring an interrupt that happened as soon as the lock was
> >>> released.
> >>
> >> That's exactly right. I've seen things like a timer interrupt occuring right
> >> after the spinlock_irqrestore, but before kvm_set_msi actually returned.
> >>
> >> [...]
> >>
> >>>> Or perhaps a different stress scenario involving a lot of VCPUs
> >>>> and external interrupts?
> >>>
> >>> You could instrument the MPIC code to find out how many loop iterations
> >>> you maxed out on, and compare that to the theoretical maximum.
> >>
> >> Numbers are pretty low, and I'll try to explain based on my observations.
> >>
> >> The problematic section in openpic_update_irq is this [1], since it loops
> >> through all VCPUs, and IRQ_local_pipe further calls IRQ_check, which loops
> >> through all pending interrupts for a VCPU [2].
> >>
> >> The guest interfaces are virtio-vhostnet, which are based on MSI
> >> (/proc/interrupts in guest shows they are MSI). For external interrupts to the
> >> guest, the irq_source destmask is currently 0, and last_cpu is 0 (unitialized),
> >> so [1] will go on and deliver the interrupt directly and unicast (no VCPUs loop).
> >>
> >> I activated the pr_debugs in arch/powerpc/kvm/mpic.c, to see how many interrupts
> >> are actually pending for the destination VCPU. At most, there were 3 interrupts
> >> - n_IRQ = {224,225,226} - even for 24 flows of ping flood. I understand that
> >> guest virtio interrupts are cascaded over 1 or a couple of shared MSI interrupts.
> >>
> >> So worst case, in this scenario, was checking the priorities for 3 pending
> >> interrupts for 1 VCPU. Something like this (some of my prints included):
> >>
> >> [61010.582033] openpic_update_irq: destmask 1 last_cpu 0
> >> [61010.582034] openpic_update_irq: Only one CPU is allowed to receive this IRQ
> >> [61010.582036] IRQ_local_pipe: IRQ 224 active 0 was 1
> >> [61010.582037] IRQ_check: irq 226 set ivpr_pr=8 pr=-1
> >> [61010.582038] IRQ_check: irq 225 set ivpr_pr=8 pr=-1
> >> [61010.582039] IRQ_check: irq 224 set ivpr_pr=8 pr=-1
> >>
> >> It would be really helpful to get your comments regarding whether these are
> >> realistical number for everyday use, or they are relevant only to this
> >> particular scenario.
> >
> > RT isn't about "realistic numbers for everyday use". It's about worst
> > cases.
> >
> >> - Can these interrupts be used in directed delivery, so that the destination
> >> mask can include multiple VCPUs?
> >
> > The Freescale MPIC does not support multiple destinations for most
> > interrupts, but the (non-FSL-specific) emulation code appears to allow
> > it.
> >
> >> The MPIC manual states that timer and IPI
> >> interrupts are supported for directed delivery, altough I'm not sure how much of
> >> this is used in the emulation. I know that kvmppc uses the decrementer outside
> >> of the MPIC.
> >>
> >> - How are virtio interrupts cascaded over the shared MSI interrupts?
> >> /proc/device-tree/soc@e0000000/msi@41600/interrupts in the guest shows 8 values
> >> - 224 - 231 - so at most there might be 8 pending interrupts in IRQ_check, is
> >> that correct?
> >
> > It looks like that's currently the case, but actual hardware supports
> > more than that, so it's possible (albeit unlikely any time soon) that
> > the emulation eventually does as well.
> >
> > But it's possible to have interrupts other than MSIs...
>
> Right.
>
> So given that the raw spinlock conversion is not suitable for all the scenarios
> supported by the OpenPIC emulation, is it ok that my next step would be to send
> a patch containing both the raw spinlock conversion and a mandatory disable of
> the in-kernel MPIC? This is actually the last conclusion we came up with some
> time ago, but I guess it was good to get some more insight on how things
> actually work (at least for me).
Fine with me. Have you given any thought to ways to restructure the
code to eliminate the problem?
-Scott
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/