Re: [PATCH 1/5] HWPOISON: define VM_FAULT_HWPOISON to 0 when featureis disabled

From: Rik van Riel
Date: Fri Jun 12 2009 - 12:06:03 EST


Ingo Molnar wrote:

So i think hwpoison simply does not affect our ability to get log messages out - but it sure allows crappier hardware to be used.
Am i wrong about that for some reason?

You are :)

A 2-bit memory error can be a temporary failure, eg.
due to a cosmic ray. If bit errors could be prevented
in hardware, there would be no reason to have ECC at all.

The only reason to stop using that page is because we
do not know for sure whether the error was temporary
or permanent (or dependent on a particular bit pattern).

Userspace needs to be notified that some data disappeared,
if it did - for clean pagecache and swap cache pages, the
kernel can simply take the page away and wait for a page
fault...

The sysadmin needs to know that something happened too,
because the hardware *might* have a problem.

However, a 2-bit error does not imply that the hardware
actually needs to be replaced.

--
All rights reversed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/