Re: [PATCH] x86: setup: extend low identity map to cover whole kernel range

From: H. Peter Anvin
Date: Thu Oct 15 2015 - 08:19:19 EST


On October 14, 2015 2:39:58 PM PDT, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>On Wed, Oct 14, 2015 at 2:00 PM, Matt Fleming
><matt@xxxxxxxxxxxxxxxxxxx> wrote:
>> On Wed, 14 Oct, at 09:22:03AM, Andy Lutomirski wrote:
>>> On Wed, Oct 14, 2015 at 6:52 AM, Matt Fleming
><matt@xxxxxxxxxxxxxxxxxxx> wrote:
>>> > (Pulling in luto for low-level x86 fu)
>>> >
>>> > On Wed, 14 Oct, at 01:30:45PM, Paolo Bonzini wrote:
>>> >> On 32-bit systems, the initial_page_table is reused by
>>> >> efi_call_phys_prolog as an identity map to call
>>> >> SetVirtualAddressMap. efi_call_phys_prolog takes care of
>>> >> converting the current CPU's GDT to a physical address too.
>>> >>
>>> >> For PAE kernels the identity mapping is achieved by aliasing the
>>> >> first PDPE for the kernel memory mapping into the first PDPE
>>> >> of initial_page_table. This makes the EFI stub's trick "just
>work".
>>> >>
>>> >> However, for non-PAE kernels there is no guarantee that the
>identity
>>> >> mapping in the initial_page_table extends as far as the GDT; in
>this
>>> >> case, accesses to the GDT will cause a page fault (which quickly
>becomes
>>> >> a triple fault). Fix this by copying the kernel mappings from
>>> >> swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET
>and at
>>> >> identity mapping.
>>> >
>>> > Oops, good catch guys. This is clearly a bug, but...
>>> >
>>> >> For some reason, this is only reproducible with QEMU's dynamic
>translation
>>> >> mode, and not for example with KVM. However, even under KVM one
>can clearly
>>> >> see that the page table is bogus:
>>>
>>> I haven't looked at the code, but it wouldn't surprise me if this is
>>> some kind of TLB issue. With the hardware TLB (which is in use on
>>> KVM), it seems quite likely that the GDT is pretty much always in
>the
>>> TLB and, if nothing flushes global mappings, then it'll probably
>stick
>>> around.
>>
>> From some quick experiments it appears that you can skate past this
>> issue if you don't receive any interrupts while the bogus GDT pointer
>> is loaded, or if you avoid reloading the segment registers in
>general.
>> Which is interesting because I assumed that writing to GDTR took
>> immediate effect.
>
>Trivia for your amusement:
>
>AFAICT it's entirely permissible for the GDTR and/or LDT descriptor to
>point to unmapped memory. Any attempt to use them (segment loads,
>interrupts, IRET, etc) will try to access that memory as if the access
>came from CPL 0 and, if the access fails, will generate a valid page
>fault with CR2 pointing into the GDT or LDT.
>
>Xen is nuts^Wclever and actually uses this.
>
>Of course, if your #PF vector references a GDT or LDT descriptor and
>trying to load that descriptor results in a page fault, you get a
>double fault.
>
>I learned this while trying to puzzle out why v1 of my LDT
>synchronization patch caused random faults on Xen.
>
>--Andy

There is no "if"... you can't get to an interrupt vector without going through the GDT or LDT. That being said, the GDT or LDT can be partially mapped.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/