Re: Testing RAM from userspace / question about memmap= arguments

From: Pavel Machek
Date: Sat Dec 22 2007 - 13:49:24 EST


On Sun 2007-12-23 02:36:14, David Newall wrote:
> Pavel Machek wrote:
>> On Sat 2007-12-22 13:42:47, Richard D wrote:
>>
>>> Cant you, modify bootmem allocator to test with memtest patterns and then
>>> use kexec (as Pavel suggested) to test the one where kernel was sitting
>>> earlier?
>>
>> I do not think you need to modify anything in kernel. Just use
>> /dev/mem to test areas that kernel doesn't see, then kexec into place
>> you already tested, and test the rest.
>
> That's still an insufficient test. One failure mode is writes at one
> location corrupting cells at another.
>
> The idea of wanting to do comprehensive and robust memory testing from
> within the operating system seems dubious at best, to me. If there is
> something wrong with memtest86, doing the tests from within Linux is not
> the answer. The answer is to fix memtest86. If the problem is that you
> automation, e.g. switching a server from production to memory test mode at
> midnight and back again at 6am, the answer is still to "fix" memtest86.
> Writing something that grabs some physical RAM from Linux's control, tests
> it, and then moves the kernel itself so that it can test the rest, is
> adding a whole extra layer of complexity to an already challenging (I
> assume, based on errors that dedicated software-based testers miss)
> problem.

Well, we have kexec. We already have way for kernel to relocate itself.

> Give up on this misguided idea and build on the best tools that are already
> available.

Yes, the idea is "interesting". I do not think it quite cuts
"misguided" part.

memtest has following problems:

0) it is kind of hard to run memtest over ssh

1) if linux fixes some problem with PCI quirk or microcode
upload, memtest will not see the fix

2) if memory only fails while something else happens (DMA to
other piece of memory? Hard disk load glitching powre
supply?), memtest will not see the problem.

(Of course, memtest-under-linux has some problems too. Like "if it
freezes, was it bad memory or kernel problem").
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/