Re: [PATCH v2 0/3] support for broken memory modules (BadRAM)

From: H. Peter Anvin
Date: Wed Jun 22 2011 - 16:34:08 EST


On 06/22/2011 01:30 PM, Stefan Assmann wrote:
> On 22.06.2011 20:15, H. Peter Anvin wrote:
>> On 06/22/2011 04:18 AM, Stefan Assmann wrote:
>>>
>>> The idea is to allow the user to specify RAM addresses that shouldn't be
>>> touched by the OS, because they are broken in some way. Not all machines have
>>> hardware support for hwpoison, ECC RAM, etc, so here's a solution that allows to
>>> use bitmasks to mask address patterns with the new "badram" kernel command line
>>> parameter.
>>> Memtest86 has an option to generate these patterns since v2.3 so the only thing
>>> for the user to do should be:
>>> - run Memtest86
>>> - note down the pattern
>>> - add badram=<pattern> to the kernel command line
>>>
>>
>> We already support the equivalent functionality with
>> memmap=<address>$<length> for those with only a few ranges... this has
>> been supported for ages, literally. For those with a lot of ranges,
>> like Google, the command line is insufficient.
>
> Right, I think this has been discussed a while ago. So the advantages I
> see in this approach are. It allows to break down memory exclusion to
> the page level with a pattern of non-consecutive pages. So if every
> other page would be considered bad that's a bit tough to deal with using
> memmap.
> Secondly patterns can be easily generated by running Memtest86 and thus
> easily be fed to the kernel by command line. Making it much more feasible
> for the average user to take advantage of it.
>

How common are nontrivial patterns on real hardware? This would be
interesting to hear from Google or another large user.

If so, we should probably introduce this as another linked-list data
structure; we can allow it to be preprocessed from the command line if
need be.

I have to say I think Google's point that truncating the list is
unacceptable... that would mean running in a known-bad configuration,
and even a hard crash would be better.

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/