Re: Device hang when offlining a CPU due to IRQ misrouting

From: Siddha, Suresh B
Date: Tue Jun 05 2007 - 14:44:06 EST


On Tue, Jun 05, 2007 at 11:33:01AM -0700, Darrick J. Wong wrote:
> On Tue, Jun 05, 2007 at 11:13:42AM -0700, Siddha, Suresh B wrote:
> > I see. Your system should have 4 or 8 logical cpu's right. So you must be
> > using logical flat mode, right?
>
> I believe so. The system has two Xeon 5150s with an Intel 5000 chipset
> of some sort.
>
> > When this bug happens, what does /proc/irq/<irq-no>/smp_affinity show?
>
> root@elm3a188:~# cat /proc/irq/114/smp_affinity
> 02

Ok. What this shows is that fixup_irqs() failed to move the irq properly.
Ideally we should see cpu_online_map here (i.e., 0xfd).

So most likely __assign_irq_vector() failed for some reason and I am
puzzled for the reason...

Does this problem happen only under certain stress or something simple, like

boot the kernel
echo 2 > /proc/irq/114/smp_affinity
wait for irq to hit the cpu1.
echo 0 > /sys/devices/system/cpu/cpu1/online

will immmd trigger this?

thanks,
suresh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/