Re: do_IRQ: 1.55 No irq handler for vector (irq -1)

From: Suresh Siddha
Date: Wed Aug 08 2012 - 15:19:49 EST


On Wed, 2012-08-08 at 13:04 +0200, Borislav Petkov wrote:
> On Wed, Aug 08, 2012 at 10:58:37AM +0200, Robert Richter wrote:
> > On 07.08.12 15:39:07, Suresh Siddha wrote:
> > > Boris, Robert, can you check if the below patch makes both of your
> > > systems happy again (essentially not allowing the vector to change for
> > > legacy irq's, which also allows the RTE to be set correctly in the smp
> > > case etc)? Based on your results and some more thinking, I will send a
> > > detailed patch with changelog tomorrow.
> > >
> > > arch/x86/kernel/apic/io_apic.c | 9 +++++++++
> > > 1 files changed, 9 insertions(+), 0 deletions(-)
> >
> > Suresh,
> >
> > with your patch applied the sata device works fine and the system
> > boots, no issues seen.
>
> Ditto,
>
> the do_IRQ issue of missing an irq handler for vector 55 is gone too on
> my box.
>
> I'm pretty sure you can add our Tested-by:'s to the official patch.
>

Ok. Thanks Robert, Boris.

I have appended the patch with the updated changelog. Ingo/Peter, Can
you please queue this for v3.6? I have a plan to clean this all up for
v3.7. I will work with Robert, Boris offline and post a cleaner fix for
v3.7 shortly. Thanks.
---

From: Suresh Siddha <suresh.b.siddha@xxxxxxxxx>
Subject: x86, apic: fix broken legacy interrupts in the logical apic mode

Recent commit 332afa656e76458ee9cf0f0d123016a0658539e4 cleaned up
a workaround that updates irq_cfg domain for legacy irq's that
are handled by the IO-APIC. This was assuming that the recent
changes in assign_irq_vector() were sufficient to remove the workaround.

But this broke couple of AMD platforms. One of them seems to be
sending interrupts to the offline cpu's, resulting in spurious
"No irq handler for vector xx (irq -1)" messages when those cpu's come online.
And the other platform seems to always send the interrupt to the last logical
CPU (cpu-7). Recent changes had an unintended side effect of using only logical
cpu-0 in the IO-APIC RTE (during boot for the legacy interrupts) and this
broke the legacy interrupts not getting routed to the cpu-7 on the AMD
platform, resulting in a boot hang.

For now, reintroduce the removed workaround, (essentially not allowing the
vector to change for legacy irq's when io-apic starts to handle the irq. Which
also addressed the uninteded sife effect of just specifying cpu-0 in the
IO-APIC RTE for those irq's during boot).

Reported-and-tested-by: Robert Richter <robert.richter@xxxxxxx>
Reported-and-tested-by: Borislav Petkov <bp@xxxxxxxxx>
Signed-off-by: Suresh Siddha <suresh.b.siddha@xxxxxxxxx>
---
arch/x86/kernel/apic/io_apic.c | 10 ++++++++++
1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index a6c64aa..c265593 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1356,6 +1356,16 @@ static void setup_ioapic_irq(unsigned int irq, struct irq_cfg *cfg,
if (!IO_APIC_IRQ(irq))
return;

+ /*
+ * For legacy irqs, cfg->domain starts with cpu 0. Now that IO-APIC
+ * can handle this irq and the apic driver is finialized at this point,
+ * update the cfg->domain.
+ */
+ if (irq < legacy_pic->nr_legacy_irqs &&
+ cpumask_equal(cfg->domain, cpumask_of(0)))
+ apic->vector_allocation_domain(0, cfg->domain,
+ apic->target_cpus());
+
if (assign_irq_vector(irq, cfg, apic->target_cpus()))
return;



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/