Re: Random crashes with i386 and efi boots

From: Andy Lutomirski
Date: Tue Sep 11 2018 - 14:05:38 EST




> On Sep 11, 2018, at 10:41 AM, Joerg Roedel <jroedel@xxxxxxx> wrote:
>
> On Tue, Sep 11, 2018 at 09:36:51AM -0700, Andy Lutomirski wrote:
>>> save_pgd = efi_call_phys_prolog();
>>> local_irq_save(flags);
>>> status = efi_call_phys(...);
>>> local_irq_restore(flags);
>>>
>>> efi_call_phys_epilog(save_pgd);
>>>
>>> So, yes, interrupts are very much enabled.
>>
>> Does fixing that solve the problem? It seems more robust.
>
> The problem is still that in efi_call_phys_prolog() we load the gdt with
> its physical address, and when we reload the %cr3 in _epilog from
> initial_page_table to swapper_pg_dir again the gdt is no longer mapped.
> Blocking interrupts is more robust, but we can't block NMIs that way
> that would also trigger the issue, no?
>
> So I am in favor of changing the order in efi_call_phys_epilog() too.
>

Iâm rather confused here. Weâre loading CR3 with page tables that donât have the fixmap mapped? With interrupts on? And we expect it to work? This is *nuts*.

There are IMO only three sane fixes here:

1. Load the fixmap, cpu_entry_area, etc into the EFI page table. Drop the GDT reload entirely.

2. Do this whole virtual map dance earlier so we donât have IRQs and NMIs and such. Maybe while weâre still using the initial page table?

3. Just identity map all the EFI regions. Make EFI page tables that literally map them at their physical addresses *and* map the entire kernel, just like we do for normal user mms.

Sure, as a stopgap, turning off IRQs and applying Guenterâs patch seems okay, but this code is not okay.