Re: SMP broken on Dell PowerEdge 4100/200 under 2.6.0-testxx?

From: William Lee Irwin III
Date: Sat Dec 06 2003 - 00:41:42 EST


l?r, 06.12.2003 kl. 06.09 skrev William Lee Irwin III:
>> If you actually manage to get interrupt rates exceeding its thresholds,
>> you should see interrupts migrated, but only dynamically and on-demand,
>> not under light usage.

On Sat, Dec 06, 2003 at 06:14:15AM +0100, Stian Jordet wrote:
> I really don't know the definition of "light usage", but I'm beating the
> aic7xxx and eth0 quite hard at times, without any interrupts being
> migrated. Anyway, thanks :) This haven't been a problem for me so far,
> and I doubt it ever will :)

Okay, this should be fixed. The entire subarch organization is wrong
for this anyway. It needs several axes to vary upon for the APIC-based
subarches:

(a) xAPIC (P-IV) vs. serial APIC (before P-IV)
(b) logical vs. physical IPI's
(c) logical vs. physical IO interrupts
(d) flat logical vs. clustered hierarchical DFR
(e) NMI wakeup vs. INIT wakeup
(f) software vs. hardware interrupt load balancing
(g) locality-dependent vs. locality-independent APIC destinations

The real problem with all this is that it was arranged around minimal
impact code changes instead of adequately describing hardware, and so
it gives rise to numerous corner cases and is generally brittle. Of
course, 2.6 is too frozen to do anything with it now, and ia32 will
likely be largely legacy during the course of 2.7, so the damage will
probably be permanent.

What you've run into is essentially there being no distinction for (a)
or (f) in mach-default, what normal Pee Cees use. There are several
disturbing differences between the two cases which are for the moment
carefully avoided but at the very least raise my eyebrows. For instance,
both the physical broadcast destination and the size of the physical
APIC ID space differ between the two cases. The difference you've been
burned by is the fact that current revisions of xAPIC's have broken
hardware interrupt load balancing, and so singleton fixed destinations
are used with software interrupt balancing instead of lowest priority
destinations with many cpus in them perfectly suitable for P-III's,
which under your light usage pinned all interrupts on cpu 0.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/