Re: memtest86, built into kernel

Pat Crean (pat@parrett.net)
Thu, 25 Apr 1996 10:24:34 +0000


> From: "Ulrich Windl" <Ulrich.Windl@rz.uni-regensburg.de>
> Organization: Universitaet Regensburg, Klinikum
> To: kkeyte@esoc.esa.de (Karl Keyte), linux-kernel@vger.rutgers.edu
> Date: Wed, 24 Apr 1996 09:51:20 +0200
> Subject: Re: memtest86, built into kernel

> On 23 Apr 96 at 17:09, Karl Keyte wrote:
>
> > > >
> > > > Given that it happens so rarely, that parity is only 50% likely to
> > > > catch the error anyway, and that parity requires an extra 12.5% DRAM,
> > > > it doesn't seem worth it to me. ECC is more useful, since it will
> > > > correct single-bit errors rather than just hanging.
> > > >
> > > > -Matt
> >
> > No, surely the parity is virtually 100% certain to catch the error...??
> > The only way it wouldn't is for more than the one bit to be in error
> > in such a way that the parity becomes valid again. If bit errors are
> > so rare, it's an unlikely situation, so the parity bits should be a
> > good test. However, it's so rare, and parity bits themselves can be
> > subject to error, I wouldn't bother with it. They don't either!
>
> The probability that a reported parity error is due to a error in the
> parity bit is 1/9. Parity errors are rather rare; thus that type of
> error is even more unlikely.
>
Actually it's considerably worse than that. If the only addition to
the system was a single bit of memory, with all else being equal,
then the probability that a reported parity error was actually caused
by a failing parity bit would be 1/9. Unfortunately, all else is not
equal; the circuitry to generate and check parity also has some
finite, non-zero failure rate which increases this probability. As
well, because parity must be calculated for every memory write, write
timing is 5-10 nsec tighter for the parity chip than it is for the
data bits which also contributes to a higher failure rate for this
bit.
In the real world, it doesn't really matter. The cost of adding
parity checking to the memory systems of pc systems is too high for
the marginal return. There is no increase in reliability of the
systems so equipped (in fact, there is a significant decrease). For
those few applications that can't tolerate an occasional failure,
there is ecc (whose actual hardware failure rate is, due to extra
components, double or triple that of unprotected memory sub-systems).
Of course, you will most likely notice a significant degradation in
performance as well as generating, storing, retrieving, and checking
the error codes adds 10-15% to the access time of your memory.
In short, if you're concerned about memory failures, by all means,
run background memory tests, but don't think that adding a parity bit
is going buy you anything to speak of.

Just my $.02 worth..... (no, I didn't get up on the wrongt side of the
bed this morning, but I did step on the coon hound sleeping on the
right side.......)

Pat