Re: [PATCH] x86/boot: Fix boot failure when SMP MP-table is based at 0

From: Thomas Gleixner
Date: Fri Nov 17 2017 - 08:04:23 EST


On Mon, 6 Nov 2017, Tom Lendacky wrote:
> On 11/6/2017 3:41 PM, H. Peter Anvin wrote:
> > On 11/06/17 12:17, Tom Lendacky wrote:
> > > When crosvm is used to boot a kernel as a VM, the SMP MP-table is found
> > > at physical address 0x0. This causes mpf_base to be set to 0 and a
> > > subsequent "if (!mpf_base)" check in default_get_smp_config() results in
> > > the MP-table not being parsed. Further into the boot this results in an
> > > oops when attempting a read_apic_id().
> > >
> > > Add a boolean variable that is set to true when the MP-table is found.
> > > Use this variable for testing if the MP-table was found so that even a
> > > value of 0 for mpf_base will result in continued parsing of the MP-table.
> > >
> > > Reported-by: Tomeu Vizoso <tomeu@xxxxxxxxxxxxxxx>
> > > Signed-off-by: Tom Lendacky <thomas.lendacky@xxxxxxx>
> >
> > Ahem... did anyone ever tell you that this is an epicly bad idea on your
> > part? The low megabyte of physical memory has very special meaning on
> > x86, and deviating from the standard use of this memory is a *very*
> > dangerous thing to do, and imposing on the kernel a "fake null pointer"
> > requirement that exists only for the convenience of your particular
> > brokenness is not okay.
> >
> > -hpa
>
> That was my initial thought... what was something doing down at the start
> of memory. But when I looked at default_find_smp_config() it specifically
> scans the bottom 1K for a an MP-table signature. I was hoping to get some
> feedback as to whether this would really be an acceptable thing to do. So
> I'm good with this patch being rejected, but the change I made in
>
> 5997efb96756 ("x86/boot: Use memremap() to map the MPF and MPC data")
>
> does break something that was working before.

This goes back to Linux 1.1.x

/*
* Physical page 0 is special; it's not touched by Linux since BIOS
* and SMM (for laptops with [34]86/SL chips) may need it. It is read
* and write protected to detect null pointer references in the
* kernel.
* It may also hold the MP configuration table when we are booting SMP.
*/

and then got changed in 2.3.40 to:

/*
* FIXME: Linux assumes you have 640K of base ram..
* this continues the error...
*
* 1) Scan the bottom 1K for a signature
* 2) Scan the top 1K of base RAM
* 3) Scan the 64K of bios
*/
if (smp_scan_config(0x0, 0x400) ||
smp_scan_config(639 * 0x400, 0x400) ||
smp_scan_config(0xF0000, 0x10000))
return;

which is what we still have completely unmodified.

What hpa considers to be epic fail was added 20+ years ago and probably
with a reason. We carried that along forever and now someone looked for a
place to stick MP config in for his VM thingy and picked the first place
which is checked. Not a big surprise.

So we are stuck with it, whether we like it or not.

Thanks,

tglx