Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early updateucode on Intel's CPU

From: H. Peter Anvin
Date: Wed Dec 19 2012 - 18:50:31 EST


On 12/19/2012 03:40 PM, Jacob Shin wrote:

Just make the hole a bit bigger, so it starts at 0xfc00000000, then you
only need one MTRR. This is the correct BIOS-level fix, and it really
needs to happen.

Do these systems actually exist in the field or are they engineering
prototypes? In the latter case, we might be done at that point.

Yes, HP is shipping (or will ship soon) such systems.


Can you get them to fix the BIOS first, or at least ship a BIOS update? Otherwise there will be a probabilistic failure, and it sounds like it is your (AMD's) fault.

The other bit is that building the real kernel page tables iteratively
(ignoring the early page tables here) is safer, since the real page
table builder is fully aware of the memory map. This means any
"spillover" from the early page tables gets minimized to regions where
there are data objects that have to be accessed early. Since Yinghai
already had iterative page table building working, I don't see any
reason to not use that capability.

Yes, I'll test again with latest, but Yinghai's patchset mapping only
RAM from top down solved our problem.

Please don't make me go Steve Ballmer on you.

We're talking about two different things... the early page tables versus the permanent page tables. The permanent page tables we can handle because the page table creation at that point is aware of the memory map.

The early page tables are what is used before we get to that point. Creating them on demand means that if there are no early-needed data structures near the hole, there will be no access and everything will be okay, but as the early page table creation *is not and cannot be* aware of the memory map. Right now that simply cannot happen, because all such data structures are confined to 32-bit addresses, however *THAT WILL CHANGE AND WILL CHANGE SOON*, exactly because these kinds of large-memory system needs that to happen. You may start seeing failures at that time, and there isn't a huge lot we can do about it.

We are trying to discuss mitigation strategies with you, but you haven't really given us any useful information, e.g. what happens near the various boundaries of the hole, what could trigger prefeching into the range, and what it would take to fix the BIOSes.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/