Re: [PATCH RFC] x86: check for and defend against BIOS memorycorruption

From: Hugh Dickins
Date: Thu Sep 04 2008 - 16:23:54 EST


On Thu, 4 Sep 2008, RafaÅ MiÅecki wrote:
> > 2008/8/29 RafaÅ MiÅecki <zajec5@xxxxxxxxx>:
> > 2008/8/29 Hugh Dickins <hugh@xxxxxxxxxxx>:
> >> Here's my version of Jeremy's patch, that I've now tested on my machines,
> >> as x86_32 and as x86_64. It addresses none of the points Alan Cox made,
> >> and it stays silent for me, even after suspend+resume, unless I actually
> >> introduce corruption myself. Omits Jeremy's check in fault.c, but does
> >> a check every minute, so should soon detect RafaÅ's HDMI corruption
> >> without any need to suspend+resume.
> >
> > Your periodic test works fine:
> >
> > Corrupted low memory at ffff88000000be9c (be9c phys) = b02a0004
> > <IRQ> [<ffffffff8020fc9b>] check_for_bios_corruption+0x93/0x9f
> > [<ffffffff8020fca7>] ? periodic_check_for_corruption+0x0/0x25
> > [<ffffffff8020fcb0>] periodic_check_for_corruption+0x9/0x25
> >
> > By the way I confirmed this bug on Sony Vaio FW11M (my one is FW11S).
> > Probably more machines from FW11* are affected.
>
> If this patch is known to work fine for Sony Vaio FW* and Alan's
> machine, could it go mainline somehow?

Well.

Thanks for the prod, and I'm certainly remiss for not following
up sooner. But I'm really not at all keen on such a patch going
into mainline myself.

It's an interesting experiment, and I'd be happy to see such a patch
(adjusted to make sure output goes to kerneloops.org) spending a little
while in Fedora Rawhide (who'd be the right contact for that?).

But so far as mainline goes, I share Alan Cox's opinion that we should
not be chopping pages out of every x86 user's memory, just because a
couple of machines with faulty BIOSes have been observed.

Particularly now it's evident that the 64kB "limit" is no more than a
reflection of where the directmap pagetable changes have caught such
corruption.

If lots more such corruptions are reported, of course I would change
my position; but those bad directmap PMD crashes are themselves quite
recognizable now we know to look out for them.

I would prefer you both to use the minimal memmap= solutions for now;
but others may disagree.

Hugh