Re: [tip:irq/core] genirq: MSI: Fix freeing of unallocated MSI

From: Marc Zyngier
Date: Thu Apr 09 2015 - 08:42:04 EST


On Thu, 9 Apr 2015 13:00:23 +0100
Ingo Molnar <mingo@xxxxxxxxxx> wrote:

Hi Ingo,

>
> * tip-bot for Marc Zyngier <tipbot@xxxxxxxxx> wrote:
>
> > Commit-ID: fe0c52fc003bc046380e52fe6799c96d770770cc
> > Gitweb: http://git.kernel.org/tip/fe0c52fc003bc046380e52fe6799c96d770770cc
> > Author: Marc Zyngier <marc.zyngier@xxxxxxx>
> > AuthorDate: Mon, 26 Jan 2015 19:10:19 +0000
> > Committer: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> > CommitDate: Wed, 8 Apr 2015 23:28:28 +0200
> >
> > genirq: MSI: Fix freeing of unallocated MSI
> >
> > While debugging an unrelated issue with the GICv3 ITS driver, the
> > following trace triggered:
> >
> > WARNING: CPU: 1 PID: 1 at kernel/irq/irqdomain.c:1121 irq_domain_free_irqs+0x160/0x17c()
> > NULL pointer, cannot free irq
> > Modules linked in:
> > CPU: 1 PID: 1 Comm: swapper/0 Tainted: G W 3.19.0-rc6+ #3690
> > Hardware name: FVP Base (DT)
> > Call trace:
> > [<ffffffc000089398>] dump_backtrace+0x0/0x13c
> > [<ffffffc0000894e4>] show_stack+0x10/0x1c
> > [<ffffffc00066d134>] dump_stack+0x74/0x94
> > [<ffffffc0000a92f8>] warn_slowpath_common+0x9c/0xd4
> > [<ffffffc0000a938c>] warn_slowpath_fmt+0x5c/0x80
> > [<ffffffc0000ee04c>] irq_domain_free_irqs+0x15c/0x17c
> > [<ffffffc0000ef918>] msi_domain_free_irqs+0x58/0x74
> > [<ffffffc000386f58>] free_msi_irqs+0xb4/0x1c0
> >
> > // The msi_prepare callback fails here
> >
> > [<ffffffc0003872c0>] pci_enable_msix+0x25c/0x3d4
> > [<ffffffc00038746c>] pci_enable_msix_range+0x34/0x80
> > [<ffffffc0003924ac>] vp_try_to_find_vqs+0xec/0x528
> > [<ffffffc000392954>] vp_find_vqs+0x6c/0xa8
> > [<ffffffc0003ee2a8>] init_vq+0x120/0x248
> > [<ffffffc0003eefb0>] virtblk_probe+0xb0/0x6bc
> > [<ffffffc00038fc34>] virtio_dev_probe+0x17c/0x214
> > [<ffffffc0003d4a04>] driver_probe_device+0x7c/0x23c
> > [<ffffffc0003d4cb0>] __driver_attach+0x98/0xa0
> > [<ffffffc0003d2c60>] bus_for_each_dev+0x60/0xb4
> > [<ffffffc0003d455c>] driver_attach+0x1c/0x28
> > [<ffffffc0003d41b0>] bus_add_driver+0x150/0x208
> > [<ffffffc0003d54c0>] driver_register+0x64/0x130
> > [<ffffffc00038f9e8>] register_virtio_driver+0x24/0x68
> > [<ffffffc00091320c>] init+0x70/0xac
> > [<ffffffc0000828f0>] do_one_initcall+0x94/0x1d0
> > [<ffffffc0008e9b00>] kernel_init_freeable+0x144/0x1e4
> > [<ffffffc00066a434>] kernel_init+0xc/0xd8
> > ---[ end trace f9ee562a77cc7bae ]---
> >
> > The ITS msi_prepare callback having failed, we end-up trying to
> > free MSIs that have never been allocated. Oddly enough, the kernel
> > is pretty upset about it.
> >
> > It turns out that this behaviour was expected before the MSI domain
> > was introduced (and dealt with in arch_teardown_msi_irqs).
> >
> > The obvious fix is to detect this early enough and bail out.
> >
> > Signed-off-by: Marc Zyngier <marc.zyngier@xxxxxxx>
> > Reviewed-by: Jiang Liu <jiang.liu@xxxxxxxxxxxxxxx>
> > Link: http://lkml.kernel.org/r/1422299419-6051-1-git-send-email-marc.zyngier@xxxxxxx
> > Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> > ---
> > kernel/irq/msi.c | 11 +++++++++--
> > 1 file changed, 9 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
> > index 3e18163..474de5c 100644
> > --- a/kernel/irq/msi.c
> > +++ b/kernel/irq/msi.c
> > @@ -310,8 +310,15 @@ void msi_domain_free_irqs(struct irq_domain *domain, struct device *dev)
> > struct msi_desc *desc;
> >
> > for_each_msi_entry(desc, dev) {
> > - irq_domain_free_irqs(desc->irq, desc->nvec_used);
> > - desc->irq = 0;
> > + /*
> > + * We might have failed to allocate an MSI early
>
> nit:
>
> s/an MSI/an MSI interrupt
>
> > + * enough that there is no IRQ associated to this
>
> nit:
>
> s/to/with
>
> > + * entry. If that's the case, don't do anything.
> > + */
> > + if (desc->irq) {
> > + irq_domain_free_irqs(desc->irq, desc->nvec_used);
> > + desc->irq = 0;
> > + }
>
> Hm, so this appears to be the first time that 'irq == 0' assumptions
> are getting into the genirq core. Is NO_IRQ dead? I realize that the
> MSI code uses '!irq' as a flag, but still, quite a few architectures
> define NO_IRQ so it appears to matter to them.

NO_IRQ strikes back, everybody takes cover! ;-)

More seriously, this seems to be two schools of thoughts on that one.
The irqdomain subsystem seems to treat 'irq == 0' as the indication that
'this is not a valid IRQ', and so does MSI (as you noticed). Given that
this code deals with MSI in conjunction with irqdomains, it felt
natural to adopt the same convention.

Also, not all the architecture are defining NO_IRQ, and it only seems
to be used in code that is doesn't look portable across architectures.
Either these architecture don't care about MSI, or they are happy
enough to consider that virtual interrupt 0 is invalid in the MSI case.

So I'm a bit lost on that one. I sincerely thought NO_IRQ was being
retired (https://lwn.net/Articles/470820/). Should we introduce a
NO_MSI_IRQ (set to zero) to take care of this case?

Thanks,

M.
--
Without deviation from the norm, progress is not possible.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/