Re: Random crashes with i386 and efi boots

From: Guenter Roeck
Date: Tue Sep 11 2018 - 14:22:21 EST


On Tue, Sep 11, 2018 at 11:05:25AM -0700, Andy Lutomirski wrote:
>
>
> > On Sep 11, 2018, at 10:41 AM, Joerg Roedel <jroedel@xxxxxxx> wrote:
> >
> > On Tue, Sep 11, 2018 at 09:36:51AM -0700, Andy Lutomirski wrote:
> >>> save_pgd = efi_call_phys_prolog();
> >>> local_irq_save(flags);
> >>> status = efi_call_phys(...);
> >>> local_irq_restore(flags);
> >>>
> >>> efi_call_phys_epilog(save_pgd);
> >>>
> >>> So, yes, interrupts are very much enabled.
> >>
> >> Does fixing that solve the problem? It seems more robust.
> >
> > The problem is still that in efi_call_phys_prolog() we load the gdt with
> > its physical address, and when we reload the %cr3 in _epilog from
> > initial_page_table to swapper_pg_dir again the gdt is no longer mapped.
> > Blocking interrupts is more robust, but we can't block NMIs that way
> > that would also trigger the issue, no?
> >
> > So I am in favor of changing the order in efi_call_phys_epilog() too.
> >
>
> Iâm rather confused here. Weâre loading CR3 with page tables that donât have the fixmap mapped? With interrupts on? And we expect it to work? This is *nuts*.
>
> There are IMO only three sane fixes here:
>
> 1. Load the fixmap, cpu_entry_area, etc into the EFI page table. Drop the GDT reload entirely.
>
> 2. Do this whole virtual map dance earlier so we donât have IRQs and NMIs and such. Maybe while weâre still using the initial page table?
>
> 3. Just identity map all the EFI regions. Make EFI page tables that literally map them at their physical addresses *and* map the entire kernel, just like we do for normal user mms.
>
> Sure, as a stopgap, turning off IRQs and applying Guenterâs patch seems okay, but this code is not okay.

I submitted a patch with the diff I suggested above; it seems to be the
least invasive solution and addresses the immediate problem.

I am way out of league regarding the other suggested changes. I'll be happy
to test the code if someone is willing to rearrange the code accordingly,
but I don't think it would make sense to even try doing it myself.

Thanks,
Guenter