I see lots of people writing memorytest that go through lots of
trouble to test ALL of the memory, even the memory that the memtest
program is in. In my eyes this is not really necessary.
The memory test program, with the kernel, will occupy max 1/8 of main
memory (1Mb kernel on 8Mb machine). What I've seen so far is that
BIOS'es will detect the "simple" errors like "bit xxyyzz stuck at 0".
The more complicated errors like "when DMA is going on, a write of
0xfffffff to a memory location with 0x00000 as the lower 20 bits of
address will fail by writing a few bits as zeroes" are not that
specific that testing ALL memory is going to make a difference.
If your "fault model" assumes "stuck at x" errors at "random"
locations throughout memory, you're going to miss out on 12.5% of the
errors when you leave a large kernel in place. Those are the errors
that don't cause crashes. The BIOS will "flunk" that memory.
My experience is that my current memory tester catches around 10% of
the bad memory. It catches ALL
- stuck at faults
- single coupling errors
- address decoder errors.
This discussion has provided a few new error causes. Once you know a
possible cause, it is not too hard to design a test for it.....
Roger.