Re: [GIT PULL] x86/mm changes for v4.4

From: Kees Cook
Date: Mon Nov 09 2015 - 16:08:09 EST


On Sat, Nov 7, 2015 at 11:55 PM, Ard Biesheuvel
<ard.biesheuvel@xxxxxxxxxx> wrote:
> On 8 November 2015 at 07:58, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
>> On Fri, Nov 6, 2015 at 11:39 PM, Ard Biesheuvel
>> <ard.biesheuvel@xxxxxxxxxx> wrote:
>>> On 7 November 2015 at 08:09, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>>>>
>>>> * Matt Fleming <matt@xxxxxxxxxxxxxxxxxxx> wrote:
>>>>
>>>>> On Fri, 06 Nov, at 07:55:50AM, Ingo Molnar wrote:
>>>>> >
>>>>> > 3) We should fix the EFI permission problem without relying on the firmware: it
>>>>> > appears we could just mark everything R-X optimistically, and if a write fault
>>>>> > happens (it's pretty rare in fact, only triggers when we write to an EFI
>>>>> > variable and so), we can mark the faulting page RW- on the fly, because it
>>>>> > appears that writable EFI sections, while not enumerated very well in 'old'
>>>>> > firmware, are still supposed to be page granular. (Even 'new' firmware I
>>>>> > wouldn't automatically trust to get the enumeration right...)
>>>>>
>>>>> Sorry, this isn't true. I misled you with one of my earlier posts on
>>>>> this topic. Let me try and clear things up...
>>>>>
>>>>> Writing to EFI regions has to do with every invocation of the EFI
>>>>> runtime services - it's not limited to when you read/write/delete EFI
>>>>> variables. In fact, EFI variables really have nothing to do with this
>>>>> discussion, they're a completely opaque concept to the OS, we have no
>>>>> idea how the firmware implements them. Everything is done via the EFI
>>>>> boot/runtime services.
>>>>>
>>>>> The firmware itself will attempt to write to EFI regions when we
>>>>> invoke the EFI services because that's where the PE/COFF ".data" and
>>>>> ".bss" sections live along with the heap. There's even some relocation
>>>>> fixups that occur as SetVirtualAddressMap() time so it'll write to
>>>>> ".text" too.
>>>>>
>>>>> Now, the above PE/COFF sections are usually (always?) contained within
>>>>> EFI regions of type EfiRuntimeServicesCode. We know this is true
>>>>> because the firmware folks have told us so, and because stopping that
>>>>> is the motivation behind the new EFI_PROPERTIES_TABLE feature in UEFI
>>>>> V2.5.
>>>>>
>>>>> The data sections within the region are also *not* guaranteed to be
>>>>> page granular because work was required in Tianocore for emitting
>>>>> sections with 4k alignment as part of the EFI_PROPERTIES_TABLE
>>>>> support.
>>>>>
>>>>> Ultimately, what this means is that if you were to attempt to
>>>>> dynamically fixup those regions that required write permission, you'd
>>>>> have to modify the mappings for the majority of the EFI regions
>>>>> anyway. And if you're blindly allowing write permission as a fixup,
>>>>> there's not much security to be had.
>>>>
>>>> I think you misunderstood my suggestion: the 'fixup' would be changing it from R-X
>>>> to RW-, i.e. it would add 'write' permission but remove 'execute' permission.
>>>>
>>>> Note that there would be no 'RWX' permission at any given moment - which is the
>>>> dangerous combination.
>>>>
>>>
>>> The problem with that is that /any/ page in the UEFI runtime region
>>> may intersect with both .text and .data of any of the PE/COFF images
>>> that make up the runtime firmware (since the PE/COFF sections are not
>>> necessarily page aligned). Such pages require RWX permissions. The
>>> UEFI memory map does not provide the information to identify those
>>> pages a priori (the entire region containing several PE/COFF images
>>> could be covered by a single entry) so it is hard to guess which pages
>>> should be allowed these RWX permissions.
>>
>> I'm sad that UEFI was designed without even the most basic of memory
>> protections in mind. UEFI _itself_ should be setting up protective
>> page mappings. :(
>>
>
> Well, the 4 KB alignment of sections was considered prohibitive at the
> time from code size pov. But this was a long time ago, obviously.

Heh, yeah, I'd expect max 4K padding to get code/data correctly
aligned on a 2MB binary to not be an issue. :)

>
>> For a boot firmware, it seems to me that safe page table layout would
>> be a top priority bug. The "reporting issues" page for TianoCore
>> doesn't actually seem to link to the "Project Tracker":
>> https://github.com/tianocore/tianocore.github.io/wiki/Reporting-Issues
>>
>> Does anyone know how to get this correctly reported so future UEFI
>> releases don't suffer from this?
>>
>
> Ugh. Don't get me started on that topic. I have been working with the
> UEFI forum since July to get a fundamentally broken implementation of
> memory protections fixed. UEFI v2.5 defines a memory protection scheme
> that is based on splitting PE/COFF images into separate memory regions
> so that R-X and RW- permissions can be applied. Unfortunately, that
> broke every OS in existence (including Windows 8), since the OS is
> allowed to reorder memory regions when it lays out the virtual
> remapping of the UEFI regions, resulting in PE/COFF .data and .text
> potentially appearing out of order.
>
> The good news is that we fixed it for the upcoming release (v2.6). I
> can't disclose any specifics, though :-(

As long as there's motion to getting it fixed, that makes me happy! :)
Does 2.6 get rid of the (AIUI) 2MB limit too?

-Kees

>
> --
> Ard.
>
>
>>>>> > If that 'supposed to be' turns out to be 'not true' (not unheard of in
>>>>> > firmware land), then plan B would be to mark pages that generate write faults
>>>>> > RWX as well, to not break functionality. (This 'mark it RWX' is not something
>>>>> > that exploits would have easy access to, and we could also generate a warning
>>>>> > [after the EFI call has finished] if it ever triggers.)
>>>>> >
>>>>> > Admittedly this approach might not be without its own complications, but it
>>>>> > looks reasonably simple (I don't think we need per EFI call page tables,
>>>>> > etc.), and does not assume much about the firmware being able to enumerate its
>>>>> > permissions properly. Were we to merge EFI support today I'd have insisted on
>>>>> > trying such an approach from day 1 on.
>>>>>
>>>>> We already have separate EFI page tables, though with the caveat that
>>>>> we share some of swapper_pg_dir's PGD entries. The best solution would
>>>>> be to stop sharing entries and isolate the EFI mappings from every
>>>>> other page table structure, so that they're only used during the EFI
>>>>> service calls.
>>>>
>>>> Absolutely. Can you try to fix this for v4.3?
>>>>
>>>> Thanks,
>>>>
>>>> Ingo
>>
>>
>>
>> --
>> Kees Cook
>> Chrome OS Security



--
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/