Re: [PATCH v4 4/4] efi: Fix handling of multiple efi_fake_mem= entries
From: Dan Williams
Date: Wed Jan 08 2020 - 16:53:54 EST
On Tue, Jan 7, 2020 at 9:52 AM Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> wrote:
>
> On Tue, 7 Jan 2020 at 06:19, Dave Young <dyoung@xxxxxxxxxx> wrote:
> >
> > On 01/06/20 at 08:16pm, Dan Williams wrote:
> > > On Mon, Jan 6, 2020 at 8:04 PM Dave Young <dyoung@xxxxxxxxxx> wrote:
> > > >
> > > > On 01/06/20 at 04:40pm, Dan Williams wrote:
> > > > > Dave noticed that when specifying multiple efi_fake_mem= entries only
> > > > > the last entry was successfully being reflected in the efi memory map.
> > > > > This is due to the fact that the efi_memmap_insert() is being called
> > > > > multiple times, but on successive invocations the insertion should be
> > > > > applied to the last new memmap rather than the original map at
> > > > > efi_fake_memmap() entry.
> > > > >
> > > > > Rework efi_fake_memmap() to install the new memory map after each
> > > > > efi_fake_mem= entry is parsed.
> > > > >
> > > > > This also fixes an issue in efi_fake_memmap() that caused it to litter
> > > > > emtpy entries into the end of the efi memory map. An empty entry causes
> > > > > efi_memmap_insert() to attempt more memmap splits / copies than
> > > > > efi_memmap_split_count() accounted for when sizing the new map. When
> > > > > that happens efi_memmap_insert() may overrun its allocation, and if you
> > > > > are lucky will spill over to an unmapped page leading to crash
> > > > > signature like the following rather than silent corruption:
> > > > >
> > > > > BUG: unable to handle page fault for address: ffffffffff281000
> > > > > [..]
> > > > > RIP: 0010:efi_memmap_insert+0x11d/0x191
> > > > > [..]
> > > > > Call Trace:
> > > > > ? bgrt_init+0xbe/0xbe
> > > > > ? efi_arch_mem_reserve+0x1cb/0x228
> > > > > ? acpi_parse_bgrt+0xa/0xd
> > > > > ? acpi_table_parse+0x86/0xb8
> > > > > ? acpi_boot_init+0x494/0x4e3
> > > > > ? acpi_parse_x2apic+0x87/0x87
> > > > > ? setup_acpi_sci+0xa2/0xa2
> > > > > ? setup_arch+0x8db/0x9e1
> > > > > ? start_kernel+0x6a/0x547
> > > > > ? secondary_startup_64+0xb6/0xc0
> > > > >
> > > > > Commit af1648984828 "x86/efi: Update e820 with reserved EFI boot
> > > > > services data to fix kexec breakage" is listed in Fixes: since it
> > > > > introduces more occurrences where efi_memmap_insert() is invoked after
> > > > > an efi_fake_mem= configuration has been parsed. Previously the side
> > > > > effects of vestigial empty entries were benign, but with commit
> > > > > af1648984828 that follow-on efi_memmap_insert() invocation triggers
> > > > > efi_memmap_insert() overruns.
> > > > >
> > > > > Fixes: 0f96a99dab36 ("efi: Add 'efi_fake_mem' boot option")
> > > > > Fixes: af1648984828 ("x86/efi: Update e820 with reserved EFI boot services...")
> > > >
> > > > A nitpick for the Fixes flags, as I replied in the thread below:
> > > > https://lore.kernel.org/linux-efi/CAPcyv4jLxqPaB22Ao9oV31Gm=b0+Phty+Uz33Snex4QchOUb0Q@xxxxxxxxxxxxxx/T/#m2bb2dd00f7715c9c19ccc48efef0fcd5fdb626e7
> > > >
> > > > I reproduced two other panics without the patches applied, so this issue
> > > > is not caused by either of the commits, maybe just drop the Fixes.
> > >
> > > Just the "Fixes: af1648984828", right? No objection from me. I'll let
> > > Ingo say if he needs a resend for that.
> > >
> > > The "Fixes: 0f96a99dab36" is valid because the original implementation
> > > failed to handle the multiple argument case from the beginning.
> >
> > Agreed, thanks!
> >
>
> I'll queue this but without the fixes tags. The -stable maintainers
> are far too trigger happy IMHO, and this really needs careful review
> before being backported. efi_fake_mem is a debug feature anyway, so I
> don't see an urgent need to get this fixed retroactively in older
> kernels.
I'm fine to drop the fixes tags.
However, I do want to point out my driving motive for digging in on
efi_fake_mem=nn@ss:0x40000, is that it is a better interface for
diverting memory ranges to device-dax than the current standard bearer
memmap=nn!ss. The main benefit is that the kernel only considers the
attribute when it is applied to EFI_CONVENTIONAL_MEMORY. This fixes a
long standing unsolvable issue of people picking busted memmap=nn!ss
settings that clobber platform memory ranges, or vector off into
nothing.
So yes, efi_fake_mem is a debug feature, but if the popularity of
memmap=nn!ss is any clue I expect efi_fake_mem=nn@ss:0x40000 will be a
useful tool going forward for late enabling, or repairing platform
"soft reservation" declarations.
I'll respin the series with those tags dropped and add the comment you
recommended about the cases when efi_memmap_free() is expected to be a
nop. Holler if there's anything else, but that's all I had on my list
to fix up.