Re: [PATCH 1/2][RFC] x86/boot/e820: Introduce e820_table_ori to represent the real original e820 layout

From: Ingo Molnar
Date: Fri Jun 23 2017 - 04:42:25 EST



* Chen Yu <yu.c.chen@xxxxxxxxx> wrote:

> Hi Ingo,
> On Thu, Jun 22, 2017 at 11:40:30AM +0200, Ingo Molnar wrote:
> >
> > * Chen Yu <yu.c.chen@xxxxxxxxx> wrote:
> >
> > > Currently we try to have e820_table_firmware to represent the
> > > original firmware memory layout passed to us by the bootloader,
> > > however it is not the case, the e820_table_firmware might still
> > > be modified by linux:
> > > 1. During bootup, the efi boot stub might allocate memory via
> > > efi service for the PCI device information structure, then
> > > later e820_reserve_setup_data() reserved these dynamically
> > > allocated structures(AKA, setup_data) in e820_table_firmware
> > > accordingly.
> > > 2. The kexec might also modify the e820_table_firmware.
> >
> > Hm, so why does the EFI code modify e280_table_firmware - why doesn't
> > it modify e820_table?
> >
> Both the e820_table and e820_table_firmware will be updated in
> e820__reserve_setup_data():
> Changing the PCI device information structures from E820_TYPE_RAM
> to E820_TYPE_RESERVED_KERN.
> > I.e. what is the point of having 3 different versions of the
> > memory layout table?
> My original thought was that, we should not record the modification
> from the efi boot stub into the e820_tabel_firmware and we are done.
> But after checking the code, I realized that if we do so the
> kexec might have potiential problem.
>
> The e820_table_firmware was introduced mainly for kexec and
> was used to pass the original memory layout to the second
> kernel:
>
> commit 5dfcf14d5b28174f94cbe9b4fb35d415db61c64a
> Author: Bernhard Walle <bwalle@xxxxxxx>
> Date: Fri Jun 27 13:12:55 2008 +0200
>
> x86: use FIRMWARE_MEMMAP on x86/E820
>
> Besides, the second kernel will not re-enter the efi boot stub
> code and it will reuse the PCI device information structure created
> by the first kernel, which is stored in the E820_TYPE_RESERVED_KERN
> region. So these PCI device information structures will not be
> modified by the second kernel, as kexec will only pass the E820_TYPE_RAM
> to the second kernel, thus the latter could leverage ioremap to access
> the PCI information.
>
> So the problem is, if we do not record the PCI information in
> the e820_table_firmware, the PCI information will be kept as
> type E820_TYPE_RAM, and all the E820_TYPE_RAM type regions will
> be passed to the second kernel and might be allocated for ordinary
> use in the second kernel, as a result the second kernel might not
> get valid PCI information(might be overwritten by others). So
> currently we try to introduce a new e820_table_ori to represent
> the original one provided by the BIOS(mainly for hibernation
> memory layout md5 checking).

So there's 3 versions we need:

- the original 'firmware' table as-is - for MD5 check and other potential
purposes

- some intermediate version of the table for kexec: what is the exact definition
of that table, what changes from the real table does it _not_ want?

- the 'real' table

all the naming should reflect that. I.e. instead of some nonsensical "_ori"
postfix, that is really the _firmware table. If kexec needs a separate one then
name it _kexec and copy it at the right stage.

Ok?

Thanks,

Ingo