Re: Device hang when offlining a CPU due to IRQ misrouting

From: Siddha, Suresh B
Date: Thu Jun 07 2007 - 21:01:36 EST


On Wed, Jun 06, 2007 at 04:16:42PM -0700, Darrick J. Wong wrote:
> On Wed, Jun 06, 2007 at 12:35:14PM -0700, Siddha, Suresh B wrote:
>
> > Weird. Then the bug can only happen if for some reason, "mask = map"
> > didn't happen in fixup_irqs(). Can you send us the disassembly of the
> > fixup_irqs()?
>
> Attached.

hmm.. Darrick, can't find anything wrong in there.

I am very much puzzled and the main thing I am confused about is, that
how come "/proc/irq/<irq#-hung>/smp_affinity" is still pointing at the old
offlined cpu, while calls to set_affinity() with cpu_online_map mask
in fixup_irqs() don't show any failure..

As you have the failing system, you need to do more detective work and
help me out. Can you try this debug patch and send across the dmesg after the
bug happens and also can you try different compiler to see if something
changes..

diff --git a/arch/x86_64/kernel/irq.c b/arch/x86_64/kernel/irq.c
index 3eaceac..fc2a576 100644
--- a/arch/x86_64/kernel/irq.c
+++ b/arch/x86_64/kernel/irq.c
@@ -152,9 +152,11 @@ void fixup_irqs(cpumask_t map)
printk("Breaking affinity for irq %i\n", irq);
mask = map;
}
- if (irq_desc[irq].chip->set_affinity)
+ if (irq_desc[irq].chip->set_affinity) {
+ printk("calling set affinity for %i, with mask %lx\n",
+ irq, cpus_addr(mask)[0]);
irq_desc[irq].chip->set_affinity(irq, mask);
- else if (irq_desc[irq].action && !(warned++))
+ } else if (irq_desc[irq].action && !(warned++))
printk("Cannot set affinity for irq %i\n", irq);
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/