Re: [patch] SMP alternatives

From: Andi Kleen
Date: Thu Nov 24 2005 - 09:25:17 EST


On Thu, Nov 24, 2005 at 02:34:07PM +0000, Alan Cox wrote:
> On Iau, 2005-11-24 at 14:39 +0100, Andi Kleen wrote:
> > That's supposed to be done by hardware, no?
>
> Varies immensely by system. Where there is a hardware scrubber and it is
> enabled it will be used. Once nice thing about K8 is the mem controller
> is in the CPU so they all use the same driver (not yet merged)

What do you need a special driver for if the northbridge just
can do the scrubbing by itself?

> > If you try to do it this way then the code will become such
> > a mess if not impossible to write that your changes to merge them
> > and get it right are very slim. The only sane way to do all the locking etc.
> > is to hand over the handling to a thread. While that make the window
> > of misusing the data wider it's the only sane alternative vs not
> > doing it at all.
>
> Its utterly hideous because the usual 'ECC error' reporting technique
> for an uncorrectable error is an NMI. Locks could be in any state at

On the modern systems I'm familiar with it's an machine check (although
not necessarily a recoverable one and there might be other bad
side effects)

> this point and even the registers needing to be accessed are across PCI
> and we could be half way through a PCI configuration cycle.
>
> The -mm EDAC code works on the basic assumption that unrecovered ECC is
> a system halter although that is configurable.

I don't know what you could do over the default code for K8 at least.
And on modern Intel server chipsets I would expect it also to not
be needed.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/