I've been meaning to ask questions about this... We have a number of
machines here with lots of ECC memory, some of which have scrubbing
logic in the chipset (BX) and some of which don't (LX,FX).
My first question is: are there any tools (esp. for Linux :) to ask the
chipset how *many* single-bit errors have occurred ?
If the average time between errors is 1000days then maybe it's not a
worry ;-) Of course, if it's 24 hours then one might worry a little...
Anyway, I've just reread what Daniel wrote and a little clarification is
required: *reading* the location only fixes the one-bit errors on some
chipsets, not all. Writing the same data back again would fix the
error, but that would be a little more dangerous and would require a
slightly more sophisticated daemon, but nothing we (I laughingly include
myself) kernel hackers can't handle.
P-Pro chipsets like the 440FX don't scrub errors, neither does the PII
LX. The BX does though...
Neil
E&OE :-)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/