Trimming some of the people in CCI found the reason, you are looking at xen-unstable, I was working with 4.1.30-OVM, it has patch of CVE-2012-4536 / XSA-21.
On Mon, 24 Jun 2013, Zhenzhong Duan wrote:
On 2013-06-20 22:21, Stefano Stabellini wrote:
On Thu, 20 Jun 2013, Zhenzhong Duan wrote:From linux side, request_irq-> request_threaded_irq-> __setup_irq->
On 2013-06-05 20:50, Stefano Stabellini wrote:That means that Linux didn't call irq_enable on the MSI-X in question:
On Wed, 5 Jun 2013, Zhenzhong Duan wrote:Did some test, domain_pirq_to_emuirq(d, unmap->pirq) = IRQ_UNBOUND in
Stefano Stabellini wrote:Can you add some printks in Xen (the hypercall name is
On Tue, 21 May 2013, Stefano Stabellini wrote:
On Tue, 21 May 2013, Konrad Rzeszutek Wilk wrote:
Looking at the hypervisor code I couldn't see anything obviously
wrong.
I think the culprit is "physdev_unmap_pirq":
if ( is_hvm_domain(d) )
{
spin_lock(&d->event_lock);
gdprintk(XENLOG_WARNING,"d%d, pirq: %d is %x %s, irq: %d\n",
d->domain_id, pirq, domain_pirq_to_emuirq(d, pirq),
domain_pirq_to_emuirq(d, pirq) == IRQ_UNBOUND ?
"unbound" :
"",
domain_pirq_to_irq(d, pirq));
if
( domain_pirq_to_emuirq(d, pirq) != IRQ_UNBOUND )
ret = unmap_domain_pirq_emuirq(d, pirq);
spin_unlock(&d->event_lock);
if ( domid == DOMID_SELF || ret )
goto free_domain;
It always tells me unbound:
(XEN) physdev.c:237:d14 14, pirq: 54 is ffffffff
(XEN) irq.c:1873:d14 14, nr_pirqs: 56
(XEN) physdev.c:237:d14 14, pirq: 53 is ffffffff
(XEN) irq.c:1873:d14 14, nr_pirqs: 56
(XEN) physdev.c:237:d14 14, pirq: 52 is ffffffff
(XEN) irq.c:1873:d14 14, nr_pirqs: 56
(XEN) physdev.c:237:d14 14, pirq: 51 is ffffffff
(XEN) irq.c:1873:d14 14, nr_pirqs: 56
(XEN) physdev.c:237:d14 14, pirq: 50 is ffffffff
(XEN) irq.c:1873:d14 14, nr_pirqs: 56
(a bit older debug code, so the 'unbound' does not show up here).
Which means that the call to unmap_domain_pirq_emuirq does not happen.
The checks in unmap_domain_pirq_emuirq also look to be depend
on the code being IRQ_UNBOUND.
In other words, all of that code looks to only clear things when
they are !IRQ_UNBOUND.
But the other logic (IRQ_UNBOUND) looks to be missing a removal
in the radix tree:
if ( emuirq != IRQ_PT )
radix_tree_delete(&d->arch.hvm_domain.emuirq_pirq, emuirq);
And
I think that is what is causing the leak - the radix tree
needs to be pruned? Or perhaps the allocate_pirq should check
the radix tree for IRQ_UNBOUND ones and re-use them?
I think that you are looking in the wrong place.
The issue is that QEMU doesn't call pt_msi_disable in
pt_msgctrl_reg_write if (!val & PCI_MSI_FLAGS_ENABLE).
The code above is correct as is because it is trying to handle
emulated
IRQs and MSIs, not real passthrough MSIs. They latter are not added to
that radix tree, see physdev_hvm_map_pirq and physdev_map_pirq.
This patch fixes the issue, I have only tested MSI (MSI-X completely
untested).
diff --git a/hw/pass-through.c b/hw/pass-through.c
index 304c438..079e465 100644
--- a/hw/pass-through.c
+++ b/hw/pass-through.c
@@ -3866,7 +3866,11 @@ static int pt_msgctrl_reg_write(struct pt_dev
*ptdev,
ptdev->msi->flags |= PCI_MSI_FLAGS_ENABLE;
}
else
- ptdev->msi->flags &= ~PCI_MSI_FLAGS_ENABLE;
+ {
+ if (ptdev->msi->flags & PT_MSI_MAPPED) {
+ pt_msi_disable(ptdev);
+ }
+ }
/* pass through MSI_ENABLE bit when no MSI-INTx translation
*/
if (!ptdev->msi_trans_en) {
@@ -4013,6 +4017,8 @@ static int pt_msixctrl_reg_write(struct pt_dev
*ptdev,
pt_disable_msi_translate(ptdev);
}
pt_msix_update(ptdev);
+ } else if (!(*value & PCI_MSIX_ENABLE) && ptdev->msix->enabled) {
+ pt_msix_delete(ptdev);
Hi Stefano,
I made a test with this patch, os reboot when driver reload. If use
pt_msix_disable
instead of pt_msix_delete, driver could be reloaded.
But I still see some error in qemu.log and xen console. Seems four
IRQs
are not freed
when unmap.
--------------first load---------------------------
pt_msix_update_one: pt_msix_update_one requested pirq = 103
pt_msix_update_one: Update msix entry 0 with pirq 67 gvec 0
pt_msix_update_one: pt_msix_update_one requested pirq = 102
pt_msix_update_one: Update msix entry 1 with pirq 66 gvec 0
pt_msix_update_one: pt_msix_update_one requested pirq = 101
pt_msix_update_one: Update msix entry 2 with pirq 65 gvec 0
pt_msix_update_one: pt_msix_update_one requested pirq = 100
pt_msix_update_one: Update msix entry 3 with pirq 64 gvec 0
------------- first unload---------------------------
pt_msix_disable: Unbind msix with pirq 67, gvec 0
pt_msix_disable: Unmap msix with pirq 67
pt_msix_disable: Error: Unmapping of MSI-X failed. [00:04.0]
pt_msix_disable: Unbind msix with pirq 66, gvec 0
pt_msix_disable: Unmap msix with pirq 66
pt_msix_disable: Error: Unmapping of MSI-X failed. [00:04.0]
pt_msix_disable: Unbind msix with pirq 65, gvec 0
pt_msix_disable: Unmap msix with pirq 65
pt_msix_disable: Error: Unmapping of MSI-X failed. [00:04.0]
pt_msix_disable: Unbind msix with pirq 64, gvec 0
pt_msix_disable: Unmap msix with pirq 64
pt_msix_disable: Error: Unmapping of MSI-X failed. [00:04.0]
PHYSDEVOP_unmap_pirq) to figure out exactly why they are failing?
physdev_unmap_pirq.
irq_enable -> __startup_pirq -> EVTCHNOP_bind_pirq
EVTCHNOP_bind_pirq is implemented by evtchn_bind_pirq in Xen and calls
map_domain_emuirq_pirq, so domain_pirq_to_emuirq(d, unmap->pirq) should
be IRQ_PT.
I don't know if that's a normal condition, but in any case it should
not create any problems to physdev_unmap_pirq, in fact the folloing
check:
if ( domid == DOMID_SELF || ret )
goto free_domain;
should fail so Xen should continue and execute unmap_domain_pirq. That's
what we want.
irq_startup-> startup_pirq-> EVTCHNOP_bind_pirq
If irq_enable isn't called, how does the driver receive interrupt, I did see
four interrupts in /proc/interrupt and driver works ok.
Good to know
Could you have a look if there is something wrong in xen side of clearing the
mapping?
What I am saying is that the error you are getting:
pt_msix_disable: Unbind msix with pirq 67, gvec 0
pt_msix_disable: Unmap msix with pirq 67
pt_msix_disable: Error: Unmapping of MSI-X failed. [00:04.0]
cannot be caused by domain_pirq_to_emuirq(d, pirq) returning
IRQ_UNBOUND.
So, why are you getting this error? What is failing?
I am ready to believe the problem is in Xen but Without understanding
why you are getting the error it's hard to find a solution.