RT interrupt handling

From: Darren Hart
Date: Fri Apr 28 2006 - 17:09:27 EST


I ran into a situation where binding a realtime testsuite to cpu 0 (on a 4 way
opteron machine) locked the machine hard while binding it to cpu 2 worked
fine. Some investigation suggests that the interrupt handlers for eth0 and
ioc0 (IRQ 24 and 26) had the smp_affinity mask set to only cpu 0. With the
test case running threads with rt prios in the 90s and the irqs running in
the ~40s (don't recall, somewhere around there I think), it isn't surprising
that the machine locked up.

I'd like to hear people's thoughts on the following:

o Why would those irqs be bound to just cpu 0? Why not all cpus?

o Is it reasonable to extend the smp_affinity for all interrupts to all cpus
to minimize this type of problem?

o Should a userspace RT task be able to take down the system? Do we roll with
the spiderman addage "With great power comes great responsibility" when
discussing RT systems, or should we consider some kind of priority boosting
mechanism for kernel services that must be run every so often to keep the
system running?

Thanks!

--
Darren Hart
IBM Linux Technology Center
Realtime Linux Team
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/