Re: [PATCH 1/2][RFC] x86/boot/e820: Introduce e820_table_ori to represent the real original e820 layout

From: Chen Yu
Date: Fri Jun 23 2017 - 00:13:46 EST


Hi Ingo,
On Thu, Jun 22, 2017 at 11:40:30AM +0200, Ingo Molnar wrote:
>
> * Chen Yu <yu.c.chen@xxxxxxxxx> wrote:
>
> > Currently we try to have e820_table_firmware to represent the
> > original firmware memory layout passed to us by the bootloader,
> > however it is not the case, the e820_table_firmware might still
> > be modified by linux:
> > 1. During bootup, the efi boot stub might allocate memory via
> > efi service for the PCI device information structure, then
> > later e820_reserve_setup_data() reserved these dynamically
> > allocated structures(AKA, setup_data) in e820_table_firmware
> > accordingly.
> > 2. The kexec might also modify the e820_table_firmware.
>
> Hm, so why does the EFI code modify e280_table_firmware - why doesn't
> it modify e820_table?
>
Both the e820_table and e820_table_firmware will be updated in
e820__reserve_setup_data():
Changing the PCI device information structures from E820_TYPE_RAM
to E820_TYPE_RESERVED_KERN.
> I.e. what is the point of having 3 different versions of the
> memory layout table?
My original thought was that, we should not record the modification
from the efi boot stub into the e820_tabel_firmware and we are done.
But after checking the code, I realized that if we do so the
kexec might have potiential problem.

The e820_table_firmware was introduced mainly for kexec and
was used to pass the original memory layout to the second
kernel:

commit 5dfcf14d5b28174f94cbe9b4fb35d415db61c64a
Author: Bernhard Walle <bwalle@xxxxxxx>
Date: Fri Jun 27 13:12:55 2008 +0200

x86: use FIRMWARE_MEMMAP on x86/E820

Besides, the second kernel will not re-enter the efi boot stub
code and it will reuse the PCI device information structure created
by the first kernel, which is stored in the E820_TYPE_RESERVED_KERN
region. So these PCI device information structures will not be
modified by the second kernel, as kexec will only pass the E820_TYPE_RAM
to the second kernel, thus the latter could leverage ioremap to access
the PCI information.

So the problem is, if we do not record the PCI information in
the e820_table_firmware, the PCI information will be kept as
type E820_TYPE_RAM, and all the E820_TYPE_RAM type regions will
be passed to the second kernel and might be allocated for ordinary
use in the second kernel, as a result the second kernel might not
get valid PCI information(might be overwritten by others). So
currently we try to introduce a new e820_table_ori to represent
the original one provided by the BIOS(mainly for hibernation
memory layout md5 checking).

Thanks,
Yu
>
> Thanks,
>
> Ingo