Re: memtest86, built into kernel

Robert Riggs (rriggs@tesser.com)
Wed, 24 Apr 1996 02:32:42 -0600 (MDT)


On 24-Apr-96 "Ulrich Windl" wrote:
>>On 23 Apr 96 at 1:59, Robert Riggs wrote:
>
>> >From Intel:
>> Soft errors are temporary; a bit of data is lost, but
>
>...from a hardware point of view. For a running Linux kernel these
>errors will stay. Who should rewrite the correct value.

The only solution is to use ECC RAM. Parity RAM will only tell
you and error has occured. It won't fix the error. And the
parity bit itself is subject to error. Parity RAM has the
advantage of at least letting you know something is wrong
as opposed to letting the error go unnoticed.

>
>> the memory cell still functions correctly and rewriting
>> the data in the cell corrects the error. Soft errors
>> are intermittent errors that occur as a result of the

Since it is an intermittent error, a memory tester does no
good for soft errors. A memery tester is only good for "hard
errors" - bad RAM. I still haven't found a better memory
tester than 'make zImage' :)

>> passage of ionizing radiation through the memory cells
>> of semiconductor devices. The most common source of this
>> radiation is alpha particles generated as a result of
>> the decay of thorium and uranium, which are found in
>> trace amounts in the packaging materials of all plastic
>> and ceramic encapsulated devices [3,4].
>>
>Ulrich

Judging from Intel's blurb on soft errors, it would seem that
the CPU and all of the support chips on the motherboard are
subject to the same soft errors as RAM. ECC RAM does no good
when it's a CPU register or a bit in the L1 or L2 cache that
has a soft error.

Soft errors are probably the least likely form of error to
occur on a microcomputer. Bad power supplies, brownouts,
bad electrical connections, slow RAM (many people try to use
the same SIMMs they used in their 486 in their new 500GHz P9
system), *bad* RAM, flaky cards on the bus, etc. are all more
common causes of problems. How are we gonna deal with that?
Linux is an OS. It does not need to concern itself unduly with
hardware problems.

My own personal observation: sunspots have more to do with
intermittent computer problems that any other phenomena. :)

Rob
(rriggs@tesser.com)