Re: Intel BIOS - Corrupted low memory at ffff880000004200

From: H. Peter Anvin
Date: Fri Jul 10 2009 - 12:13:01 EST


Ingo Molnar wrote:
>
> So i'd really like to know what is happening there, instead of just
> zapping support for 64K of RAM on the majority of Linux systems.
>
> We might end up doing the same thing in the end (i.e. disable that
> 64k of RAM) - but it should be an informed decision, not a wild stab
> in the dark.
>

Speaking as a boot loader author, I can let you know that these kinds of
problems are in no wise limited to suspend/resume.

Pretty much any time you're executing BIOS code you're going to have
*some* platform which has severe memory corruption somewhere. This is
particularly painful for boot loaders, obviously, because the BIOS
corrupts the boot loader as it is running. In most cases, there simply
isn't any way to prevent the corruption, and it's simply dumb luck that
you will boot most of the time.

And no, I don't think EFI is going to magically solve anything. EFI
will just spread the same class of corruption problems over the entire
memory map. It will reduce the density of such bugs -- in particular it
will eliminiate the "right offset, wrong segment" as well as "idiot
coding assembly" class of problems -- but it will not confine the ones
that can and will happen; it's still fundamentally a super-privileged
flat memory space.

The root cause seems to be a lack of verification practices in the BIOS
industry in the post-DOS era. Back when DOS was still a commercially
significant system, the BIOS didn't just support the running OS, it also
directly supported running applications. That put a relatively high bar
on how broken your BIOS could be and still have a viable platform.
These days, it doesn't look like neither the BIOS vendors nor the OEMs
necessarily even know how to QA, and since the BIOS industry is
relatively small and highly consolidated, if there isn't sufficient OEM
pressure it simply won't happen since there is no money in it.

The HDMI case is a good example -- that probably involved SMI being
triggered and the SMI code then clobbering a wild pointer.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/