Re: [BUG 4.15-rc7] IRQ matrix management errors

From: Thomas Gleixner
Date: Wed Jan 17 2018 - 02:34:33 EST


On Tue, 16 Jan 2018, Keith Busch wrote:
> On Tue, Jan 16, 2018 at 12:20:18PM +0100, Thomas Gleixner wrote:
> > 8<----------------------
> > diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
> > index f8b03bb8e725..3cc471beb50b 100644
> > --- a/arch/x86/kernel/apic/vector.c
> > +++ b/arch/x86/kernel/apic/vector.c
> > @@ -542,14 +542,17 @@ static int x86_vector_alloc_irqs(struct irq_domain *domain, unsigned int virq,
> >
> > err = assign_irq_vector_policy(irqd, info);
> > trace_vector_setup(virq + i, false, err);
> > - if (err)
> > + if (err) {
> > + irqd->chip_data = NULL;
> > + free_apic_chip_data(apicd);
> > goto error;
> > + }
> > }
> >
> > return 0;
> >
> > error:
> > - x86_vector_free_irqs(domain, virq, i + 1);
> > + x86_vector_free_irqs(domain, virq, i);
> > return err;
> > }
> >
>
> The patch does indeed fix all the warnings and allows device binding to
> succeed, albeit in a degraded performance mode. Despite that, this is
> a good fix, and looks applicable to 4.4-stable, so:
>
> Tested-by: Keith Busch <keith.busch@xxxxxxxxx>
>
> I'm still concerned assign_irq_vector_policy is failing. That has
> interrupt allocation abandon MSI-x and fall back to legacy IRQ.

Can you trace the matrix allocations from the very beginning or tell me how
to reproduce. I'd like to figure out why this is happening.

> Your patch does address my main concern, though. Are you comfortable
> enough to queue this up for 4.15?

Yes, it's a pretty obvious bug and the fix is straight forward.

Thanks,

tglx