Re: [tip:x86/mm] x86/boot/32: Defer resyncing initial_page_table until per-cpu is set up
From: Andy Lutomirski
Date: Mon May 08 2017 - 07:22:00 EST
On Mon, May 8, 2017 at 2:32 AM, Andy Shevchenko
<andy.shevchenko@xxxxxxxxx> wrote:
> On Mon, May 8, 2017 at 9:31 AM, Jan Kiszka <jan.kiszka@xxxxxxxxxxx> wrote:
>> On 2017-03-23 10:14, tip-bot for Andy Lutomirski wrote:
>>> The x86 smpboot trampoline expects initial_page_table to have the
>>> GDT mapped. If the GDT ends up in a virtually mapped per-cpu page,
>>> then it won't be in the page tables at all until perc-pu areas are
>>> set up. The result will be a triple fault the first time that the
>>> CPU attempts to access the GDT after LGDT loads the perc-pu GDT.
>>>
>>> This appears to be an old bug, but somehow the GDT fixmap rework
>>> is triggering it. This seems to have something to do with the
>>> memory layout.
>
>> This breaks the boot on our Intel Quark platform (IOT2000, similar to
>> Galileo Gen2). Reverting it over master makes it work again. Any idea
>> what goes wrong? Let me know how I can help debugging this.
>
> JFYI: As of today linux-next when _kexec:ed_ works fine to me
>
> Perhaps I can test this later with direct boot from SD card.
>
The most likely explanation is that there's some code that needs the
page table synced and runs before setup_per_cpu_areas(). The relevant
init code is:
setup_arch(&command_line);
mm_init_cpumask(&init_mm);
setup_command_line(command_line);
setup_nr_cpu_ids();
setup_per_cpu_areas();
so I didn't move it very far. It would be awesome if we could get a
backtrace when the failure happens, but it's likely to be a triple
fault. Is this an EFI boot? I bet the failure is in efi_init().
Could you try reverting just the deletions in the patch? I.e. try a
kernel with both the old and the new copies of the code I moved.
--Andy