Re: [PATCH v38 10/24] mm: Add vm_ops->mprotect()

From: Andy Lutomirski
Date: Fri Sep 18 2020 - 20:15:36 EST



> On Sep 18, 2020, at 4:53 PM, Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote:
>
> On Fri, Sep 18, 2020 at 08:09:04AM -0700, Andy Lutomirski wrote:
>>> On Tue, Sep 15, 2020 at 4:28 AM Jarkko Sakkinen
>>> <jarkko.sakkinen@xxxxxxxxxxxxxxx> wrote:
>>>
>>> From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
>>>
>>> Add vm_ops()->mprotect() for additional constraints for a VMA.
>>>
>>> Intel Software Guard eXtensions (SGX) will use this callback to add two
>>> constraints:
>>>
>>> 1. Verify that the address range does not have holes: each page address
>>> must be filled with an enclave page.
>>> 2. Verify that VMA permissions won't surpass the permissions of any enclave
>>> page within the address range. Enclave cryptographically sealed
>>> permissions for each page address that set the upper limit for possible
>>> VMA permissions. Not respecting this can cause #GP's to be emitted.
>
> Side note, #GP is wrong. EPCM violations are #PFs. Skylake CPUs #GP, but
> that's technically an errata. But this isn't the real motivation, e.g.
> userspace can already trigger #GP/#PF by reading/writing a bad address, SGX
> simply adds another flavor.
>
>> It's been awhile since I looked at this. Can you remind us: is this
>> just preventing userspace from shooting itself in the foot or is this
>> something more important?
>
> Something more important, it's used to prevent userspace from circumventing
> a noexec filesystem by loading code into an enclave, and to give the kernel the
> option of adding enclave specific LSM policies in the future.
>
> The source file (if one exists) for the enclave is long gone when the enclave
> is actually mmap()'d and mprotect()'d. To enforce noexec, the requested
> permissions for a given page are snapshotted when the page is added to the
> enclave, i.e. when the enclave is built. Enclave pages that will be executable
> must originate from an a MAYEXEC VMA, e.g. the source page can't come from a
> noexec file system.
>
> The ->mprotect() hook allows SGX to reject mprotect() if userspace is declaring
> permissions beyond what are allowed, e.g. trying to map an enclave page with
> EXEC permissions when the page was added to the enclave without EXEC.
>
> Future LSM policies have a similar need due to vm_file always pointing at
> /dev/sgx/enclave, e.g. policies couldn't be attached to a specific enclave.
> ->mprotect() again allows enforcing permissions at map time that were checked
> at enclave build time, e.g. via an LSM hook.
>
> Deferring ->mprotect() until LSM support is added (if it ever is) would be
> problematic due to SGX2. With SGX2, userspace can extend permissions of an
> enclave page (for the CPU's EPC Map entry, not the kernel's page tables)
> without bouncing through the kernel. Without ->mprotect () enforcement.
> userspace could do EADD(RW) -> mprotect(RWX) -> EMODPE(X) to gain W+X. We
> want to disallow such a flow now, i.e. force userspace to do EADD(RW,X), so
> that the hypothetical LSM hook would have all information at EADD(), i.e.
> would be aware of the EXEC permission, without creating divergent behavior
> based on whether or not an LSM is active.

That’s what I thought. Can we get this in the changelog?