RE: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding

From: Wu, Feng
Date: Mon Jun 29 2015 - 09:28:15 EST




> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@xxxxxxxxxx]
> Sent: Friday, June 19, 2015 4:04 AM
> To: Wu, Feng
> Cc: Eric Auger; Avi Kivity; kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> pbonzini@xxxxxxxxxx; mtosatti@xxxxxxxxxx; Joerg Roedel
> Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
>
> [Adding Joerg since he was part of this original idea]
>
> On Thu, 2015-06-18 at 09:16 +0000, Wu, Feng wrote:
> >
> >
> > > -----Original Message-----
> > > From: Alex Williamson [mailto:alex.williamson@xxxxxxxxxx]
> > > Sent: Tuesday, June 16, 2015 12:45 AM
> > > To: Eric Auger
> > > Cc: Avi Kivity; Wu, Feng; kvm@xxxxxxxxxxxxxxx;
> linux-kernel@xxxxxxxxxxxxxxx;
> > > pbonzini@xxxxxxxxxx; mtosatti@xxxxxxxxxx
> > > Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
> > >
> > > On Mon, 2015-06-15 at 18:17 +0200, Eric Auger wrote:
> > > > Hi Alex, all,
> > > > On 06/12/2015 09:03 PM, Alex Williamson wrote:
> > > > > On Fri, 2015-06-12 at 21:48 +0300, Avi Kivity wrote:
> > > > >> On 06/12/2015 06:41 PM, Alex Williamson wrote:
> > > > >>> On Fri, 2015-06-12 at 00:23 +0000, Wu, Feng wrote:
> > > > >>>>> -----Original Message-----
> > > > >>>>> From: Avi Kivity [mailto:avi.kivity@xxxxxxxxx]
> > > > >>>>> Sent: Friday, June 12, 2015 3:59 AM
> > > > >>>>> To: Wu, Feng; kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> > > > >>>>> Cc: pbonzini@xxxxxxxxxx; mtosatti@xxxxxxxxxx;
> > > > >>>>> alex.williamson@xxxxxxxxxx; eric.auger@xxxxxxxxxx
> > > > >>>>> Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
> > > > >>>>>
> > > > >>>>> On 06/11/2015 01:51 PM, Feng Wu wrote:
> > > > >>>>>> From: Eric Auger <eric.auger@xxxxxxxxxx>
> > > > >>>>>>
> > > > >>>>>> This patch adds and documents a new KVM_DEV_VFIO_DEVICE
> > > group
> > > > >>>>>> and 2 device attributes:
> KVM_DEV_VFIO_DEVICE_FORWARD_IRQ,
> > > > >>>>>> KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. The purpose is to be
> > > able
> > > > >>>>>> to set a VFIO device IRQ as forwarded or not forwarded.
> > > > >>>>>> the command takes as argument a handle to a new struct named
> > > > >>>>>> kvm_vfio_dev_irq.
> > > > >>>>> Is there no way to do this automatically? After all, vfio knows that
> a
> > > > >>>>> device interrupt is forwarded to some eventfd, and kvm knows that
> > > some
> > > > >>>>> eventfd is forwarded to a guest interrupt. If they compare notes
> > > > >>>>> through a central registry, they can figure out that the interrupt
> needs
> > > > >>>>> to be forwarded.
> > > > >>>> Oh, just like Eric mentioned in his reply, this description is out of
> context
> > > of
> > > > >>>> this series, I will remove them in the next version.
> > > > >>>
> > > > >>> I suspect Avi's question was more general. While forward/unforward
> is
> > > > >>> out of context for this series, it's very similar in nature to
> > > > >>> enabling/disabling posted interrupts. So I think the question remains
> > > > >>> whether we really need userspace to participate in creating this
> > > > >>> shortcut or if kvm and vfio can some how orchestrate figuring it out
> > > > >>> automatically.
> > > > >>>
> > > > >>> Personally I don't know how we could do it automatically. We've
> always
> > > > >>> relied on userspace to independently setup vfio and kvm such that
> > > > >>> neither have any idea that the other is there and update each side
> > > > >>> independently when anything changes. So it seems consistent to
> > > continue
> > > > >>> that here. It doesn't seem like there's much to gain
> performance-wise
> > > > >>> either, updates should be a relatively rare event I'd expect.
> > > > >>>
> > > > >>> There's really no metadata associated with an eventfd, so "comparing
> > > > >>> notes" automatically might imply some central registration entity.
> That
> > > > >>> immediately sounds like a much more complex solution, but maybe Avi
> > > has
> > > > >>> some ideas to manage it. Thanks,
> > > > >>>
> > > > >>
> > > > >> The idea is to have a central registry maintained by a posted interrupts
> > > > >> manager. Both vfio and kvm pass the filp (along with extra
> information)
> > > > >> to the posted interrupts manager, which, when it detects a filp match,
> > > > >> tells each of them what to do.
> > > > >>
> > > > >> The advantages are:
> > > > >> - old userspace gains the optimization without change
> > > > >> - a userspace API is more expensive to maintain than internal kernel
> > > > >> interfaces (CVEs, documentation, maintaining backwards compatibility)
> > > > >> - if you can do it without a new interface, this indicates that all the
> > > > >> information in the new interface is redundant. That means you have
> to
> > > > >> check it for consistency with the existing information, so it's extra
> > > > >> work (likely, it's exactly what the posted interrupt manager would be
> > > > >> doing anyway).
> > > > >
> > > > > Yep, those all sound like good things and I believe that's similar in
> > > > > design to the way we had originally discussed this interaction at
> > > > > LPC/KVM Forum several years ago. I'd be in favor of that approach.
> > > >
> > > > I guess this discussion also is relevant wrt "[RFC v6 00/16] KVM-VFIO
> > > > IRQ forward control" series? Or is that "central registry maintained by
> > > > a posted interrupts manager" something more specific to x86?
> > >
> > > I'd think we'd want it for any sort of offload and supporting both
> > > posted-interrupts and irq-forwarding would be a good validation. I
> > > imagine there would be registration/de-registration callbacks separate
> > > for interrupt producers vs interrupt consumers. Each registration
> > > function would likely provide a struct of callbacks, probably similar to
> > > the get_symbol callbacks proposed for the kvm-vfio device on the IRQ
> > > producer side. The eventfd would be the token that the manager would
> > > use to match producers and consumers. The hard part is probably
> > > figuring out what information to retrieve from the producer and provide
> > > to the consumer in a generic way between pci and platform, but as an
> > > internal interface, it's not a big deal if we screw it up a few times to
> > > start. Thanks,
> >
> > On posted-interrupts side, the main purpose of the new APIs is to update
> > the IRTE when guest changes vMSI/vMSIx configuration. Alex, do you have
> > any detailed ideas for the new solution to achieve this purpose? It should
> > be helpful if you can share some!
>
>
> There are plenty of details to be filled in, but I think the basics
> looks something like the code below. The IRQ bypass manager just
> defines a pair of structures, one for interrupt producers and one for
> interrupt consumers. I'm certain that we'll need more callbacks than
> I've defined below, but figuring out what those should be for the best
> abstraction is the hardest part of this idea. The manager provides both
> registration and de-registration interfaces for both types of objects
> and keeps lists for each, protected by a lock. The manager doesn't even
> really need to know what the match token is, but I assume for our
> purposes it will be an eventfd_ctx.
>
> On the vfio side, the producer struct would be embedded in the
> vfio_pci_irq_ctx struct. KVM would probably embed the consumer struct
> in _irqfd. As I've coded below, the IRQ bypass manager calls the
> consumer callbacks, so the producer struct would need fields or
> callbacks to provide the consumer the info it needs. AIUI the Posted
> Interrupt model, VFIO only needs to provide data to the consumer. For
> IRQ Forwarding, I think the producer needs to be informed when bypass is
> active to model the incoming interrupt as edge vs level.
>
> I've prototyped the base IRQ bypass manager here as static, but I don't
> see any reason it couldn't be a module that's loaded by dependency when
> either vfio-pci or kvm-intel is loaded (or other producer/consumer
> objects).
>
> Is this a reasonable starting point to craft the additional fields and
> callbacks and interaction of who calls who that we need to support
> Posted Interrupts and IRQ Forwarding? Is the AMD version of this still
> alive? Thanks,
>
> Alex
>
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index 413a7bf..22f6fcb 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -61,6 +61,7 @@ config KVM_INTEL
> depends on KVM
> # for perf_guest_get_msrs():
> depends on CPU_SUP_INTEL
> + select IRQ_BYPASS_MANAGER
> ---help---
> Provides support for KVM on Intel processors equipped with the VT
> extensions.
> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> index 579d83b..02912f1 100644
> --- a/drivers/vfio/pci/Kconfig
> +++ b/drivers/vfio/pci/Kconfig
> @@ -2,6 +2,7 @@ config VFIO_PCI
> tristate "VFIO support for PCI devices"
> depends on VFIO && PCI && EVENTFD
> select VFIO_VIRQFD
> + select IRQ_BYPASS_MANAGER
> help
> Support for the PCI VFIO bus driver. This is required to make
> use of PCI drivers using the VFIO framework.
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 1f577b4..4e053be 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -181,6 +181,7 @@ static int vfio_intx_set_signal(struct vfio_pci_device
> *vdev, int fd)
>
> if (vdev->ctx[0].trigger) {
> free_irq(pdev->irq, vdev);
> + /* irq_bypass_unregister_producer(); */
> kfree(vdev->ctx[0].name);
> eventfd_ctx_put(vdev->ctx[0].trigger);
> vdev->ctx[0].trigger = NULL;
> @@ -214,6 +215,8 @@ static int vfio_intx_set_signal(struct vfio_pci_device
> *vdev, int fd)
> return ret;
> }
>
> + /* irq_bypass_register_producer(); */
> +
> /*
> * INTx disable will stick across the new irq setup,
> * disable_irq won't.
> @@ -319,6 +322,7 @@ static int vfio_msi_set_vector_signal(struct
> vfio_pci_device *vdev,
>
> if (vdev->ctx[vector].trigger) {
> free_irq(irq, vdev->ctx[vector].trigger);
> + /* irq_bypass_unregister_producer(); */
> kfree(vdev->ctx[vector].name);
> eventfd_ctx_put(vdev->ctx[vector].trigger);
> vdev->ctx[vector].trigger = NULL;
> @@ -360,6 +364,8 @@ static int vfio_msi_set_vector_signal(struct
> vfio_pci_device *vdev,
> return ret;
> }
>
> + /* irq_bypass_register_producer(); */
> +
> vdev->ctx[vector].trigger = trigger;
>
> return 0;
> diff --git a/include/linux/irqbypass.h b/include/linux/irqbypass.h
> new file mode 100644
> index 0000000..718508e
> --- /dev/null
> +++ b/include/linux/irqbypass.h
> @@ -0,0 +1,23 @@
> +#ifndef IRQBYPASS_H
> +#define IRQBYPASS_H
> +
> +#include <linux/list.h>
> +
> +struct irq_bypass_producer {
> + struct list_head node;
> + void *token;
> + /* TBD */
> +};
> +
> +struct irq_bypass_consumer {
> + struct list_head node;
> + void *token;
> + void (*add_producer)(struct irq_bypass_producer *);
> + void (*del_producer)(struct irq_bypass_producer *);

These two callbacks should be common function, for PI, I need to add
something specific to x86, such as, updating the associated IRTE, how
should I do for this?

> +};
> +
> +int irq_bypass_register_producer(struct irq_bypass_producer *);
> +void irq_bypass_unregister_producer(struct irq_bypass_producer *);
> +int irq_bypass_register_consumer(struct irq_bypass_consumer *);
> +void irq_bypass_unregister_consumer(struct irq_bypass_consumer *);
> +#endif /* IRQBYPASS_H */
> diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
> index 9a76e3b..4502cdc 100644
> --- a/kernel/irq/Kconfig
> +++ b/kernel/irq/Kconfig
> @@ -100,4 +100,7 @@ config SPARSE_IRQ
>
> If you don't know what to do here, say N.
>
> +config IRQ_BYPASS_MANAGER
> + bool
> +
> endmenu
> diff --git a/kernel/irq/Makefile b/kernel/irq/Makefile
> index d121235..a30ed77 100644
> --- a/kernel/irq/Makefile
> +++ b/kernel/irq/Makefile
> @@ -7,3 +7,4 @@ obj-$(CONFIG_PROC_FS) += proc.o
> obj-$(CONFIG_GENERIC_PENDING_IRQ) += migration.o
> obj-$(CONFIG_PM_SLEEP) += pm.o
> obj-$(CONFIG_GENERIC_MSI_IRQ) += msi.o
> +obj-$(CONFIG_IRQ_BYPASS_MANAGER) += bypass.o
> diff --git a/kernel/irq/bypass.c b/kernel/irq/bypass.c
> new file mode 100644
> index 0000000..5d0f92b
> --- /dev/null
> +++ b/kernel/irq/bypass.c

Is it better to put this code here or in vfio folder?

Thanks,
Feng

> @@ -0,0 +1,116 @@
> +/*
> + * IRQ offload/bypass manager
> + *
> + * Various virtualization hardware acceleration techniques allow bypassing
> + * or offloading interrupts receieved from devices around the host kernel.
> + * Posted Interrupts on Intel VT-d systems can allow interrupts to be
> + * recieved directly by a virtual machine. ARM IRQ Forwarding can allow
> + * level triggered device interrupts to be de-asserted directly by the VM.
> + * This manager allows interrupt producers and consumers to find each other
> + * to enable this sort of bypass.
> + */
> +
> +#include <linux/irqbypass.h>
> +#include <linux/list.h>
> +#include <linux/module.h>
> +#include <linux/mutex.h>
> +
> +static LIST_HEAD(producers);
> +static LIST_HEAD(consumers);
> +static DEFINE_MUTEX(lock);
> +
> +int irq_bypass_register_producer(struct irq_bypass_producer *producer)
> +{
> + struct irq_bypass_producer *tmp;
> + struct irq_bypass_consumer *consumer;
> + int ret = 0;
> +
> + mutex_lock(&lock);
> +
> + list_for_each_entry(tmp, &producers, node) {
> + if (tmp->token == producer->token) {
> + ret = -EINVAL;
> + goto unlock;
> + }
> + }
> +
> + list_add(&producer->node, &producers);
> +
> + list_for_each_entry(consumer, &consumers, node) {
> + if (consumer->token == producer->token) {
> + consumer->add_producer(producer);
> + break;
> + }
> + }
> +unlock:
> + mutex_unlock(&lock);
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(irq_bypass_register_producer);
> +
> +void irq_bypass_unregister_producer(struct irq_bypass_producer *producer)
> +{
> + struct irq_bypass_consumer *consumer;
> +
> + mutex_lock(&lock);
> +
> + list_for_each_entry(consumer, &consumers, node) {
> + if (consumer->token == producer->token) {
> + consumer->del_producer(producer);
> + break;
> + }
> + }
> +
> + list_del(&producer->node);
> +
> + mutex_unlock(&lock);
> +}
> +EXPORT_SYMBOL_GPL(irq_bypass_unregister_producer);
> +
> +int irq_bypass_register_consumer(struct irq_bypass_consumer *consumer)
> +{
> + struct irq_bypass_consumer *tmp;
> + struct irq_bypass_producer *producer;
> + int ret = 0;
> +
> + mutex_lock(&lock);
> +
> + list_for_each_entry(tmp, &consumers, node) {
> + if (tmp->token == consumer->token) {
> + ret = -EINVAL;
> + goto unlock;
> + }
> + }
> +
> + list_add(&consumer->node, &consumers);
> +
> + list_for_each_entry(producer, &producers, node) {
> + if (producer->token == consumer->token) {
> + consumer->add_producer(producer);
> + break;
> + }
> + }
> +unlock:
> + mutex_unlock(&lock);
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(irq_bypass_register_consumer);
> +
> +void irq_bypass_unregister_consumer(struct irq_bypass_consumer *consumer)
> +{
> + struct irq_bypass_producer *producer;
> +
> + mutex_lock(&lock);
> +
> + list_for_each_entry(producer, &producers, node) {
> + if (producer->token == consumer->token) {
> + consumer->del_producer(producer);
> + break;
> + }
> + }
> +
> + list_del(&consumer->node);
> +
> + mutex_unlock(&lock);
> +}
> +EXPORT_SYMBOL_GPL(irq_bypass_unregister_consumer);
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 9ff4193..f3da161 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -429,6 +429,8 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd
> *args)
> */
> fdput(f);
>
> + /* irq_bypass_register_consumer(); */
> +
> return 0;
>
> fail:
> @@ -528,6 +530,8 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd
> *args)
> struct _irqfd *irqfd, *tmp;
> struct eventfd_ctx *eventfd;
>
> + /* irq_bypass_unregister_consumer() */
> +
> eventfd = eventfd_ctx_fdget(args->fd);
> if (IS_ERR(eventfd))
> return PTR_ERR(eventfd);
>
>

N‹§²æ¸›yú²X¬¶ÇvØ–)Þ{.nlj·¥Š{±‘êX§¶›¡Ü}©ž²ÆzÚj:+v‰¨¾«‘êZ+€Êzf£¢·hšˆ§~†­†Ûÿû®w¥¢¸?™¨è&¢)ßf”ùy§m…á«a¶Úÿ 0¶ìå