Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding

From: Alex Williamson
Date: Thu Jun 18 2015 - 16:04:22 EST


[Adding Joerg since he was part of this original idea]

On Thu, 2015-06-18 at 09:16 +0000, Wu, Feng wrote:
>
>
> > -----Original Message-----
> > From: Alex Williamson [mailto:alex.williamson@xxxxxxxxxx]
> > Sent: Tuesday, June 16, 2015 12:45 AM
> > To: Eric Auger
> > Cc: Avi Kivity; Wu, Feng; kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> > pbonzini@xxxxxxxxxx; mtosatti@xxxxxxxxxx
> > Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
> >
> > On Mon, 2015-06-15 at 18:17 +0200, Eric Auger wrote:
> > > Hi Alex, all,
> > > On 06/12/2015 09:03 PM, Alex Williamson wrote:
> > > > On Fri, 2015-06-12 at 21:48 +0300, Avi Kivity wrote:
> > > >> On 06/12/2015 06:41 PM, Alex Williamson wrote:
> > > >>> On Fri, 2015-06-12 at 00:23 +0000, Wu, Feng wrote:
> > > >>>>> -----Original Message-----
> > > >>>>> From: Avi Kivity [mailto:avi.kivity@xxxxxxxxx]
> > > >>>>> Sent: Friday, June 12, 2015 3:59 AM
> > > >>>>> To: Wu, Feng; kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> > > >>>>> Cc: pbonzini@xxxxxxxxxx; mtosatti@xxxxxxxxxx;
> > > >>>>> alex.williamson@xxxxxxxxxx; eric.auger@xxxxxxxxxx
> > > >>>>> Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
> > > >>>>>
> > > >>>>> On 06/11/2015 01:51 PM, Feng Wu wrote:
> > > >>>>>> From: Eric Auger <eric.auger@xxxxxxxxxx>
> > > >>>>>>
> > > >>>>>> This patch adds and documents a new KVM_DEV_VFIO_DEVICE
> > group
> > > >>>>>> and 2 device attributes: KVM_DEV_VFIO_DEVICE_FORWARD_IRQ,
> > > >>>>>> KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. The purpose is to be
> > able
> > > >>>>>> to set a VFIO device IRQ as forwarded or not forwarded.
> > > >>>>>> the command takes as argument a handle to a new struct named
> > > >>>>>> kvm_vfio_dev_irq.
> > > >>>>> Is there no way to do this automatically? After all, vfio knows that a
> > > >>>>> device interrupt is forwarded to some eventfd, and kvm knows that
> > some
> > > >>>>> eventfd is forwarded to a guest interrupt. If they compare notes
> > > >>>>> through a central registry, they can figure out that the interrupt needs
> > > >>>>> to be forwarded.
> > > >>>> Oh, just like Eric mentioned in his reply, this description is out of context
> > of
> > > >>>> this series, I will remove them in the next version.
> > > >>>
> > > >>> I suspect Avi's question was more general. While forward/unforward is
> > > >>> out of context for this series, it's very similar in nature to
> > > >>> enabling/disabling posted interrupts. So I think the question remains
> > > >>> whether we really need userspace to participate in creating this
> > > >>> shortcut or if kvm and vfio can some how orchestrate figuring it out
> > > >>> automatically.
> > > >>>
> > > >>> Personally I don't know how we could do it automatically. We've always
> > > >>> relied on userspace to independently setup vfio and kvm such that
> > > >>> neither have any idea that the other is there and update each side
> > > >>> independently when anything changes. So it seems consistent to
> > continue
> > > >>> that here. It doesn't seem like there's much to gain performance-wise
> > > >>> either, updates should be a relatively rare event I'd expect.
> > > >>>
> > > >>> There's really no metadata associated with an eventfd, so "comparing
> > > >>> notes" automatically might imply some central registration entity. That
> > > >>> immediately sounds like a much more complex solution, but maybe Avi
> > has
> > > >>> some ideas to manage it. Thanks,
> > > >>>
> > > >>
> > > >> The idea is to have a central registry maintained by a posted interrupts
> > > >> manager. Both vfio and kvm pass the filp (along with extra information)
> > > >> to the posted interrupts manager, which, when it detects a filp match,
> > > >> tells each of them what to do.
> > > >>
> > > >> The advantages are:
> > > >> - old userspace gains the optimization without change
> > > >> - a userspace API is more expensive to maintain than internal kernel
> > > >> interfaces (CVEs, documentation, maintaining backwards compatibility)
> > > >> - if you can do it without a new interface, this indicates that all the
> > > >> information in the new interface is redundant. That means you have to
> > > >> check it for consistency with the existing information, so it's extra
> > > >> work (likely, it's exactly what the posted interrupt manager would be
> > > >> doing anyway).
> > > >
> > > > Yep, those all sound like good things and I believe that's similar in
> > > > design to the way we had originally discussed this interaction at
> > > > LPC/KVM Forum several years ago. I'd be in favor of that approach.
> > >
> > > I guess this discussion also is relevant wrt "[RFC v6 00/16] KVM-VFIO
> > > IRQ forward control" series? Or is that "central registry maintained by
> > > a posted interrupts manager" something more specific to x86?
> >
> > I'd think we'd want it for any sort of offload and supporting both
> > posted-interrupts and irq-forwarding would be a good validation. I
> > imagine there would be registration/de-registration callbacks separate
> > for interrupt producers vs interrupt consumers. Each registration
> > function would likely provide a struct of callbacks, probably similar to
> > the get_symbol callbacks proposed for the kvm-vfio device on the IRQ
> > producer side. The eventfd would be the token that the manager would
> > use to match producers and consumers. The hard part is probably
> > figuring out what information to retrieve from the producer and provide
> > to the consumer in a generic way between pci and platform, but as an
> > internal interface, it's not a big deal if we screw it up a few times to
> > start. Thanks,
>
> On posted-interrupts side, the main purpose of the new APIs is to update
> the IRTE when guest changes vMSI/vMSIx configuration. Alex, do you have
> any detailed ideas for the new solution to achieve this purpose? It should
> be helpful if you can share some!


There are plenty of details to be filled in, but I think the basics
looks something like the code below. The IRQ bypass manager just
defines a pair of structures, one for interrupt producers and one for
interrupt consumers. I'm certain that we'll need more callbacks than
I've defined below, but figuring out what those should be for the best
abstraction is the hardest part of this idea. The manager provides both
registration and de-registration interfaces for both types of objects
and keeps lists for each, protected by a lock. The manager doesn't even
really need to know what the match token is, but I assume for our
purposes it will be an eventfd_ctx.

On the vfio side, the producer struct would be embedded in the
vfio_pci_irq_ctx struct. KVM would probably embed the consumer struct
in _irqfd. As I've coded below, the IRQ bypass manager calls the
consumer callbacks, so the producer struct would need fields or
callbacks to provide the consumer the info it needs. AIUI the Posted
Interrupt model, VFIO only needs to provide data to the consumer. For
IRQ Forwarding, I think the producer needs to be informed when bypass is
active to model the incoming interrupt as edge vs level.

I've prototyped the base IRQ bypass manager here as static, but I don't
see any reason it couldn't be a module that's loaded by dependency when
either vfio-pci or kvm-intel is loaded (or other producer/consumer
objects).

Is this a reasonable starting point to craft the additional fields and
callbacks and interaction of who calls who that we need to support
Posted Interrupts and IRQ Forwarding? Is the AMD version of this still
alive? Thanks,

Alex

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 413a7bf..22f6fcb 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -61,6 +61,7 @@ config KVM_INTEL
depends on KVM
# for perf_guest_get_msrs():
depends on CPU_SUP_INTEL
+ select IRQ_BYPASS_MANAGER
---help---
Provides support for KVM on Intel processors equipped with the VT
extensions.
diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index 579d83b..02912f1 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -2,6 +2,7 @@ config VFIO_PCI
tristate "VFIO support for PCI devices"
depends on VFIO && PCI && EVENTFD
select VFIO_VIRQFD
+ select IRQ_BYPASS_MANAGER
help
Support for the PCI VFIO bus driver. This is required to make
use of PCI drivers using the VFIO framework.
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 1f577b4..4e053be 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -181,6 +181,7 @@ static int vfio_intx_set_signal(struct vfio_pci_device *vdev, int fd)

if (vdev->ctx[0].trigger) {
free_irq(pdev->irq, vdev);
+ /* irq_bypass_unregister_producer(); */
kfree(vdev->ctx[0].name);
eventfd_ctx_put(vdev->ctx[0].trigger);
vdev->ctx[0].trigger = NULL;
@@ -214,6 +215,8 @@ static int vfio_intx_set_signal(struct vfio_pci_device *vdev, int fd)
return ret;
}

+ /* irq_bypass_register_producer(); */
+
/*
* INTx disable will stick across the new irq setup,
* disable_irq won't.
@@ -319,6 +322,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,

if (vdev->ctx[vector].trigger) {
free_irq(irq, vdev->ctx[vector].trigger);
+ /* irq_bypass_unregister_producer(); */
kfree(vdev->ctx[vector].name);
eventfd_ctx_put(vdev->ctx[vector].trigger);
vdev->ctx[vector].trigger = NULL;
@@ -360,6 +364,8 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
return ret;
}

+ /* irq_bypass_register_producer(); */
+
vdev->ctx[vector].trigger = trigger;

return 0;
diff --git a/include/linux/irqbypass.h b/include/linux/irqbypass.h
new file mode 100644
index 0000000..718508e
--- /dev/null
+++ b/include/linux/irqbypass.h
@@ -0,0 +1,23 @@
+#ifndef IRQBYPASS_H
+#define IRQBYPASS_H
+
+#include <linux/list.h>
+
+struct irq_bypass_producer {
+ struct list_head node;
+ void *token;
+ /* TBD */
+};
+
+struct irq_bypass_consumer {
+ struct list_head node;
+ void *token;
+ void (*add_producer)(struct irq_bypass_producer *);
+ void (*del_producer)(struct irq_bypass_producer *);
+};
+
+int irq_bypass_register_producer(struct irq_bypass_producer *);
+void irq_bypass_unregister_producer(struct irq_bypass_producer *);
+int irq_bypass_register_consumer(struct irq_bypass_consumer *);
+void irq_bypass_unregister_consumer(struct irq_bypass_consumer *);
+#endif /* IRQBYPASS_H */
diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
index 9a76e3b..4502cdc 100644
--- a/kernel/irq/Kconfig
+++ b/kernel/irq/Kconfig
@@ -100,4 +100,7 @@ config SPARSE_IRQ

If you don't know what to do here, say N.

+config IRQ_BYPASS_MANAGER
+ bool
+
endmenu
diff --git a/kernel/irq/Makefile b/kernel/irq/Makefile
index d121235..a30ed77 100644
--- a/kernel/irq/Makefile
+++ b/kernel/irq/Makefile
@@ -7,3 +7,4 @@ obj-$(CONFIG_PROC_FS) += proc.o
obj-$(CONFIG_GENERIC_PENDING_IRQ) += migration.o
obj-$(CONFIG_PM_SLEEP) += pm.o
obj-$(CONFIG_GENERIC_MSI_IRQ) += msi.o
+obj-$(CONFIG_IRQ_BYPASS_MANAGER) += bypass.o
diff --git a/kernel/irq/bypass.c b/kernel/irq/bypass.c
new file mode 100644
index 0000000..5d0f92b
--- /dev/null
+++ b/kernel/irq/bypass.c
@@ -0,0 +1,116 @@
+/*
+ * IRQ offload/bypass manager
+ *
+ * Various virtualization hardware acceleration techniques allow bypassing
+ * or offloading interrupts receieved from devices around the host kernel.
+ * Posted Interrupts on Intel VT-d systems can allow interrupts to be
+ * recieved directly by a virtual machine. ARM IRQ Forwarding can allow
+ * level triggered device interrupts to be de-asserted directly by the VM.
+ * This manager allows interrupt producers and consumers to find each other
+ * to enable this sort of bypass.
+ */
+
+#include <linux/irqbypass.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+
+static LIST_HEAD(producers);
+static LIST_HEAD(consumers);
+static DEFINE_MUTEX(lock);
+
+int irq_bypass_register_producer(struct irq_bypass_producer *producer)
+{
+ struct irq_bypass_producer *tmp;
+ struct irq_bypass_consumer *consumer;
+ int ret = 0;
+
+ mutex_lock(&lock);
+
+ list_for_each_entry(tmp, &producers, node) {
+ if (tmp->token == producer->token) {
+ ret = -EINVAL;
+ goto unlock;
+ }
+ }
+
+ list_add(&producer->node, &producers);
+
+ list_for_each_entry(consumer, &consumers, node) {
+ if (consumer->token == producer->token) {
+ consumer->add_producer(producer);
+ break;
+ }
+ }
+unlock:
+ mutex_unlock(&lock);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(irq_bypass_register_producer);
+
+void irq_bypass_unregister_producer(struct irq_bypass_producer *producer)
+{
+ struct irq_bypass_consumer *consumer;
+
+ mutex_lock(&lock);
+
+ list_for_each_entry(consumer, &consumers, node) {
+ if (consumer->token == producer->token) {
+ consumer->del_producer(producer);
+ break;
+ }
+ }
+
+ list_del(&producer->node);
+
+ mutex_unlock(&lock);
+}
+EXPORT_SYMBOL_GPL(irq_bypass_unregister_producer);
+
+int irq_bypass_register_consumer(struct irq_bypass_consumer *consumer)
+{
+ struct irq_bypass_consumer *tmp;
+ struct irq_bypass_producer *producer;
+ int ret = 0;
+
+ mutex_lock(&lock);
+
+ list_for_each_entry(tmp, &consumers, node) {
+ if (tmp->token == consumer->token) {
+ ret = -EINVAL;
+ goto unlock;
+ }
+ }
+
+ list_add(&consumer->node, &consumers);
+
+ list_for_each_entry(producer, &producers, node) {
+ if (producer->token == consumer->token) {
+ consumer->add_producer(producer);
+ break;
+ }
+ }
+unlock:
+ mutex_unlock(&lock);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(irq_bypass_register_consumer);
+
+void irq_bypass_unregister_consumer(struct irq_bypass_consumer *consumer)
+{
+ struct irq_bypass_producer *producer;
+
+ mutex_lock(&lock);
+
+ list_for_each_entry(producer, &producers, node) {
+ if (producer->token == consumer->token) {
+ consumer->del_producer(producer);
+ break;
+ }
+ }
+
+ list_del(&consumer->node);
+
+ mutex_unlock(&lock);
+}
+EXPORT_SYMBOL_GPL(irq_bypass_unregister_consumer);
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 9ff4193..f3da161 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -429,6 +429,8 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
*/
fdput(f);

+ /* irq_bypass_register_consumer(); */
+
return 0;

fail:
@@ -528,6 +530,8 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
struct _irqfd *irqfd, *tmp;
struct eventfd_ctx *eventfd;

+ /* irq_bypass_unregister_consumer() */
+
eventfd = eventfd_ctx_fdget(args->fd);
if (IS_ERR(eventfd))
return PTR_ERR(eventfd);



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/