Re: [Suggestion] Memory check

Rogier Wolff (R.E.Wolff@BitWizard.nl)
Tue, 18 Aug 1998 09:19:33 +0200 (MEST)


Roeland Th. Jansen wrote:
> > Frank Gockel (gockel@sent13.uni-duisburg.de) mailed a memtest patch to
> > linux-kernel in July 1998 to test memory at boot up. One big advantage
> > of doing the test in kernel is that he marks faulty pages as reserved
> > to stop the kernel using them. Waiting until init is running can be
> > too late, the page might already be in use.
>
> marking pages fault so that it's not used IMHO is a bad idea. if memory is
> at fault, report it and stop.

On a running system, you can consider marking the page bad, killing
the process (if the page was writable), and continuing.

-=HOWEVER=- The PC hardware doesn't reliably tell the OS where the
memory error occurred. That makes it hard for the kernel to determine
what exactly happpend and to try to continue. So whenever a memory
problem is signalled you get the dazed and confused message, and the
kernel continues without taking any memory offline.

However MANY MANY systems simply don't notice bad things
happening. For example the parity hardware may find a word OK, but the
CPU might latch a different value. That causes an apparent memory
error, which the parity hardware doesn't find.

Many PCs don't have parity. If you want to run memory tests, you
expect to find a large share of the actual errors. That's not
true. The models of the errors that the memory testers are based on
are not valid. Yes, some errors conform to the models, but most found
in current PCs don't.

Proposal for an EE masters thesis:

Find out what actually happens when memory problems are not
found by classical memory tests.

- Contact hardware suppliers (Digital, Sun, HP, Dell) and ask
them to "sponsor" the project by sending you the memory boards
that have proven to be unreliable (keep coming back from
customers), but pass the memory tests anyway....

Roger.

-- 
| The secret of success is sincerity.  Once you can |  R.E.Wolff@BitWizard.nl 
| fake that, you've got it made.  -- Jean Giraudoux |       T: +31-15-2137555 
- Custom Linux device drivers for sale! Call for a quote. - F: +31-15-2138217

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html