Re: [KVM PATCH v2 2/2] kvm: use POLLHUP to close an irqfd insteadof an explicit ioctl

From: Michael S. Tsirkin
Date: Sun Jun 14 2009 - 09:29:23 EST


On Sun, Jun 14, 2009 at 08:53:11AM -0400, Gregory Haskins wrote:
> Michael S. Tsirkin wrote:
> > On Thu, Jun 04, 2009 at 08:48:12AM -0400, Gregory Haskins wrote:
> >
> >> +static void
> >> +irqfd_disconnect(struct _irqfd *irqfd)
> >> +{
> >> + struct kvm *kvm;
> >> +
> >> + mutex_lock(&irqfd->lock);
> >> +
> >> + kvm = rcu_dereference(irqfd->kvm);
> >> + rcu_assign_pointer(irqfd->kvm, NULL);
> >> +
> >> + mutex_unlock(&irqfd->lock);
> >> +
> >> + if (!kvm)
> >> + return;
> >>
> >> mutex_lock(&kvm->lock);
> >> - kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1);
> >> - kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0);
> >> + list_del(&irqfd->list);
> >> mutex_unlock(&kvm->lock);
> >> +
> >> + /*
> >> + * It is important to not drop the kvm reference until the next grace
> >> + * period because there might be lockless references in flight up
> >> + * until then
> >> + */
> >> + synchronize_srcu(&irqfd->srcu);
> >> + kvm_put_kvm(kvm);
> >> }
> >>
> >
> > So irqfd object will persist after kvm goes away, until eventfd is closed?
> >
>
> Yep, by design. It becomes part of the eventfd and is thus associated
> with its lifetime. Consider it as if we made our own anon-fd
> implementation for irqfd and the lifetime looks similar. The difference
> is that we are reusing eventfd and its interface semantics.
> >
> >>
> >> static int
> >> irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
> >> {
> >> struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait);
> >> + unsigned long flags = (unsigned long)key;
> >>
> >> - /*
> >> - * The wake_up is called with interrupts disabled. Therefore we need
> >> - * to defer the IRQ injection until later since we need to acquire the
> >> - * kvm->lock to do so.
> >> - */
> >> - schedule_work(&irqfd->work);
> >> + if (flags & POLLIN)
> >> + /*
> >> + * The POLLIN wake_up is called with interrupts disabled.
> >> + * Therefore we need to defer the IRQ injection until later
> >> + * since we need to acquire the kvm->lock to do so.
> >> + */
> >> + schedule_work(&irqfd->inject);
> >> +
> >> + if (flags & POLLHUP) {
> >> + /*
> >> + * The POLLHUP is called unlocked, so it theoretically should
> >> + * be safe to remove ourselves from the wqh using the locked
> >> + * variant of remove_wait_queue()
> >> + */
> >> + remove_wait_queue(irqfd->wqh, &irqfd->wait);
> >> + flush_work(&irqfd->inject);
> >> + irqfd_disconnect(irqfd);
> >> +
> >> + cleanup_srcu_struct(&irqfd->srcu);
> >> + kfree(irqfd);
> >> + }
> >>
> >> return 0;
> >> }
> >>
> >
> > And it is removed by this function when eventfd is closed.
> > But what prevents the kvm module from going away, meanwhile?
> >
>
> Well, we hold a reference to struct kvm until we call
> irqfd_disconnect(). If kvm closes first, we disconnect and disassociate
> all references to kvm leaving irqfd->kvm = NULL. Likewise, if irqfd
> closes first, we disassociate with kvm with the above quoted logic. In
> either case, we are holding a kvm reference up until that "disconnect"
> point. Therefore kvm should not be able to disappear before that
> disconnect, and after that point we do not care.

Yes, we do care.

Here's the scenario in more detail:

- kvm is closed
- irq disconnect is called
- kvm is put
- kvm module is removed: all irqs are disconnected
- eventfd closes and triggers callback into removed kvm module
- crash

> If that is not sufficient to prevent kvm.ko from going away in the
> middle, then IMO kvm_get_kvm() has a bug, not irqfd. ;) However, I
> believe everything is actually ok here.
>
> -Greg
>


BTW, why can't we remove irqfds in kvm_release?

--
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/