Re: [PATCH V2 08/32] x86/sgx: x86/sgx: Add sgx_encl_page->vm_run_prot_bits for dynamic permission changes

From: Reinette Chatre
Date: Fri Mar 04 2022 - 15:33:59 EST


Hi Jarkko,

On 3/4/2022 12:55 AM, Jarkko Sakkinen wrote:
> On Mon, Feb 07, 2022 at 04:45:30PM -0800, Reinette Chatre wrote:
>> Enclave creators declare their enclave page permissions (EPCM
>> permissions) at the time the pages are added to the enclave. These
>> page permissions are the vetted permissible accesses of the enclave
>> pages and stashed off (in struct sgx_encl_page->vm_max_prot_bits)
>> for later comparison with enclave PTEs and VMAs.
>>
>> Current permission support assume that EPCM permissions remain static
>> for the lifetime of the enclave. This is about to change with the
>> addition of support for SGX2 where the EPCM permissions of enclave
>> pages belonging to an initialized enclave may change during the
>> enclave's lifetime.
>>
>> Support for changing of EPCM permissions should continue to respect
>> the vetted maximum protection bits maintained in
>> sgx_encl_page->vm_max_prot_bits. Towards this end, add
>> sgx_encl_page->vm_run_prot_bits in preparation for support of
>> enclave page permission changes. sgx_encl_page->vm_run_prot_bits
>> reflect the active EPCM permissions of an enclave page and are not to
>> exceed sgx_encl_page->vm_max_prot_bits.
>>
>> Two permission fields are used: sgx_encl_page->vm_run_prot_bits
>> reflects the current EPCM permissions and is used to manage the page
>> table entries while sgx_encl_page->vm_max_prot_bits contains the vetted
>> maximum protection bits and is used to guide which EPCM permissions
>> are allowed in the upcoming SGX2 permission changing support (it guides
>> what values sgx_encl_page->vm_run_prot_bits may have).
>>
>> Consider this example how sgx_encl_page->vm_max_prot_bits and
>> sgx_encl_page->vm_run_prot_bits are used:
>>
>> (1) Add an enclave page with secinfo of RW to an uninitialized enclave:
>> sgx_encl_page->vm_max_prot_bits = RW
>> sgx_encl_page->vm_run_prot_bits = RW
>>
>> At this point RW VMAs would be allowed to access this page and PTEs
>> would allow write access as guided by
>> sgx_encl_page->vm_run_prot_bits.
>>
>> (2) User space invokes SGX2 to change the EPCM permissions to read-only.
>> This is allowed because sgx_encl_page->vm_max_prot_bits = RW:
>> sgx_encl_page->vm_max_prot_bits = RW
>> sgx_encl_page->vm_run_prot_bits = R
>>
>> At this point only new read-only VMAs would be allowed to access
>> this page and PTEs would not allow write access as guided
>> by sgx_encl_page->vm_run_prot_bits.
>>
>> (3) User space invokes SGX2 to change the EPCM permissions to RX.
>> This will not be supported by the kernel because
>> sgx_encl_page->vm_max_prot_bits = RW:
>> sgx_encl_page->vm_max_prot_bits = RW
>> sgx_encl_page->vm_run_prot_bits = R
>>
>> (3) User space invokes SGX2 to change the EPCM permissions to RW.
>> This will be allowed because sgx_encl_page->vm_max_prot_bits = RW:
>> sgx_encl_page->vm_max_prot_bits = RW
>> sgx_encl_page->vm_run_prot_bits = RW
>>
>> At this point RW VMAs would again be allowed to access this page
>> and PTEs would allow write access as guided by
>> sgx_encl_page->vm_run_prot_bits.
>>
>> struct sgx_encl_page hosting this information is maintained for each
>> enclave page so the space consumed by the struct is important.
>> The existing sgx_encl_page->vm_max_prot_bits is already unsigned long
>> while only using three bits. Transition to a bitfield for the two
>> members containing protection bits.
>>
>> Signed-off-by: Reinette Chatre <reinette.chatre@xxxxxxxxx>
>> ---
>> Changes since V1:
>> - Add snippet to Documentation/x86/sgx.rst that details the difference
>> between vm_max_prot_bits and vm_run_prot_bits (Andy and Jarkko).
>> - Change subject line (Jarkko).
>> - Refer to actual variables instead of using English rephrasing -
>> sgx_encl_page->vm_run_prot_bits instead of "runtime
>> protection bits" (Jarkko).
>> - Add information in commit message on why two fields are needed
>> (Jarkko).
>>
>> Documentation/x86/sgx.rst | 10 ++++++++++
>> arch/x86/kernel/cpu/sgx/encl.c | 6 +++---
>> arch/x86/kernel/cpu/sgx/encl.h | 3 ++-
>> arch/x86/kernel/cpu/sgx/ioctl.c | 6 ++++++
>> 4 files changed, 21 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
>> index 5659932728a5..9df620b59f83 100644
>> --- a/Documentation/x86/sgx.rst
>> +++ b/Documentation/x86/sgx.rst
>> @@ -99,6 +99,16 @@ The relationships between the different permission masks are:
>> * PTEs are installed to match the EPCM permissions, but not be more
>> relaxed than the VMA permissions.
>>
>> +During runtime the EPCM permissions of enclave pages belonging to an
>> +initialized enclave can change on systems supporting SGX2. In support
>> +of these runtime changes the kernel maintains (for each enclave page)
>> +the most permissive EPCM permission mask allowed by policy as
>> +the ``vm_max_prot_bits`` of that page. EPCM permissions are not allowed
>> +to be relaxed beyond ``vm_max_prot_bits``. The kernel also maintains
>> +the currently active EPCM permissions of an enclave page as its
>> +``vm_run_prot_bits`` to ensure PTEs and new VMAs respect the active
>> +EPCM permission values.
>> +
>> On systems supporting SGX2 EPCM permissions may change while the
>> enclave page belongs to a VMA without impacting the VMA permissions.
>> This means that a running VMA may appear to allow access to an enclave
>> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
>> index 1ba01c75a579..a980d8458949 100644
>> --- a/arch/x86/kernel/cpu/sgx/encl.c
>> +++ b/arch/x86/kernel/cpu/sgx/encl.c
>> @@ -164,7 +164,7 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
>> * exceed the VMA permissions.
>> */
>> vm_prot_bits = vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
>> - page_prot_bits = entry->vm_max_prot_bits & vm_prot_bits;
>> + page_prot_bits = entry->vm_run_prot_bits & vm_prot_bits;
>> /*
>> * Add VM_SHARED so that PTE is made writable right away if VMA
>> * and EPCM are writable (no COW in SGX).
>> @@ -217,7 +217,7 @@ static vm_fault_t sgx_vma_pfn_mkwrite(struct vm_fault *vmf)
>> goto out;
>> }
>>
>> - if (!(entry->vm_max_prot_bits & VM_WRITE))
>> + if (!(entry->vm_run_prot_bits & VM_WRITE))
>> ret = VM_FAULT_SIGBUS;
>>
>> out:
>> @@ -280,7 +280,7 @@ int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
>> mutex_lock(&encl->lock);
>> xas_lock(&xas);
>> xas_for_each(&xas, page, PFN_DOWN(end - 1)) {
>> - if (~page->vm_max_prot_bits & vm_prot_bits) {
>> + if (~page->vm_run_prot_bits & vm_prot_bits) {
>> ret = -EACCES;
>> break;
>> }
>> diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
>> index fec43ca65065..dc262d843411 100644
>> --- a/arch/x86/kernel/cpu/sgx/encl.h
>> +++ b/arch/x86/kernel/cpu/sgx/encl.h
>> @@ -27,7 +27,8 @@
>>
>> struct sgx_encl_page {
>> unsigned long desc;
>> - unsigned long vm_max_prot_bits;
>> + unsigned long vm_max_prot_bits:8;
>> + unsigned long vm_run_prot_bits:8;
>> struct sgx_epc_page *epc_page;
>> struct sgx_encl *encl;
>> struct sgx_va_page *va_page;
>> diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
>> index 83df20e3e633..7e0819a89532 100644
>> --- a/arch/x86/kernel/cpu/sgx/ioctl.c
>> +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
>> @@ -197,6 +197,12 @@ static struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl,
>> /* Calculate maximum of the VM flags for the page. */
>> encl_page->vm_max_prot_bits = calc_vm_prot_bits(prot, 0);
>>
>> + /*
>> + * At time of allocation, the runtime protection bits are the same
>> + * as the maximum protection bits.
>> + */
>> + encl_page->vm_run_prot_bits = encl_page->vm_max_prot_bits;
>> +
>> return encl_page;
>> }
>>
>> --
>> 2.25.1
>>
>
> This patch I can NAK without 2nd thought. It adds to the round-trips of
> using ENCLU[EMODPE].
>
> A better idea is the one I explain in
>
> https://lore.kernel.org/linux-sgx/20220304033918.361495-1-jarkko@xxxxxxxxxx/T/#u
>
> The only thing this patch is doing is adding artifical complexity when
> we already have somewhat complex microarchitecture for permissions.

The motivation of this change is to ensure that the kernel does not allow
access to a page that the page does not allow. For example, this change
ensures that the kernel will not allow execution on an enclave page that
is not executable.

In your change this is removed and the kernel will, for example, allow
execution of enclave pages that are not executable.

It could be seen as complex, but it is done out of respect for security.

>
> Please just drop this patch from the next version and also ioctl for relax.
>

I responded with more detail to your proposal in:
https://lore.kernel.org/linux-sgx/684930a2-a247-7d5e-90e8-6e80db618c4c@xxxxxxxxx/

Reinette