Re: Testing RAM from userspace / question about memmap= arguments

From: David Newall
Date: Sun Dec 23 2007 - 02:36:17 EST


Pavel Machek wrote:
On Sun 2007-12-23 07:06:58, David Newall wrote:
It's kind of hard to run anything over SSH if it has to be run before userspace is up. But the kernel can collect results from a modified memtest, after it chains back.

memtest can be ran from userspace, that's the point.

I'm not sure I believe that. You need to tinker with hardware tables before you know what physical RAM is being used. Sequential virtual pages might be mapped to sequential physical RAM, but it might also be mapped psuedo-randomly, or even page-reverse-sequential! How can you do a basic walking bit test when you could be accessing pages in random order?

1) if linux fixes some problem with PCI quirk or microcod
upload, memtest will not see the fix
What are you saying? Linux is going to fix faulty RAM?

Yes, that's what CPU microcode update is for. And I want to test my
RAM with up-to-date microcode.

Don't microcode updates fix CPU bugs? That's not fixing faulty RAM. If base microcode is so faulty as to make RAM access unreliable, the CPU probably won't even POST, let alone boot the kernel and start a whole bunch of userspace stuff, before it can get around to checking to see if there is new microcode for that CPU and download it.

I suppose a CPU retains microcode updates, once loaded, until power-down or some hard reboot that you surely can avoid. If it does happen that you have an update that works around something unrelated to the CPU, for example maybe interaction with a bridge, then you can update the CPU before running memtest. Once loaded it's there until power down.

These are not RAM faults. The very last thing you want is evidence that
you've got a faulty piece of RAM when the fault is actually a hard disk glitch!

No, it may be power supply leading to RAM problems. Yes, I want to
detect that.

I'm sure you don't mean that. I'm sure you don't want a faulty power supply to look like faulty RAM. No amount of replacing pieces of memory is going to solve a faulty power supply. At worst you'll hit on a combination of pieces that pass the test ... and then the system will fail, mysteriously, in production. I'm certain you don't want that.

Anyhow, good luck with your idea. I think it's crazy, and that you're doomed to failure. Doomed! I tell you.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/