Re: [RFC PATCH 0/9] security: x86/sgx: SGX vs. LSM

From: Stephen Smalley
Date: Tue Jun 04 2019 - 11:38:10 EST


On 6/3/19 2:30 PM, Xing, Cedric wrote:
From: Christopherson, Sean J
Sent: Monday, June 03, 2019 10:16 AM

On Sun, Jun 02, 2019 at 12:29:35AM -0700, Xing, Cedric wrote:
Hi Sean,

Generally I agree with your direction but think ALLOW_* flags are
completely internal to LSM because they can be both produced and
consumed inside an LSM module. So spilling them into SGX driver and
also user mode code makes the solution ugly and in some cases
impractical because not every enclave host process has a priori
knowledge on whether or not an enclave page would be EMODPE'd at
runtime.

In this case, the host process should tag *all* pages it *might* convert
to executable as ALLOW_EXEC. LSMs can (and should/will) be written in
such a way that denying ALLOW_EXEC is fatal to the enclave if and only
if the enclave actually attempts mprotect(PROT_EXEC).

What if those pages contain self-modifying code but the host doesn't know ahead of time? Would it require ALLOW_WRITE|ALLOW_EXEC at EADD? Then would it prevent those pages to start with PROT_EXEC?

Anyway, my point is that it is unnecessary even if it works.


Take the SELinux path for example. The only scenario in which
PROT_WRITE is cleared from @allowed_prot is if the page *starts* with
PROT_EXEC.
If PROT_EXEC is denied on a page that starts RW, e.g. an EAUG'd page,
then PROT_EXEC will be cleared from @allowed_prot.

As Stephen pointed out, auditing the denials on @allowed_prot means the
log will contain false positives of a sort. But this is more of a noise
issue than true false positives. E.g. there are three possible outcomes
for the enclave.

- The enclave does not do EMODPE[PROT_EXEC] in any scenario, ever.
Requesting ALLOW_EXEC is either a straightforward a userspace bug or
a poorly written generic enclave loader.

- The enclave conditionally performs EMODPE[PROT_EXEC]. In this case
the denial is a true false positive.

- The enclave does EMODPE[PROT_EXEC] and its host userspace then fails
on mprotect(PROT_EXEC), i.e. the LSM denial is working as intended.
The audit log will be noisy, but viewed as a whole the denials
aren't
false positives.

What I was talking about was EMODPE[PROT_WRITE] on an RX page.


The potential for noisy audit logs and/or false positives is unfortunate,
but it's (by far) the lesser of many evils.

Theoretically speaking, what you really need is a per page flag (let's
name it WRITTEN?) indicating whether a page has ever been written to
(or more precisely, granted PROT_WRITE), which will be used to decide
whether to grant PROT_EXEC when requested in future. Given the fact
that all mprotect() goes through LSM and mmap() is limited to
PROT_NONE, it's easy for LSM to capture that flag by itself instead of
asking user mode code to provide it.

That said, here is the summary of what I think is a better approach.
* In hook security_file_alloc(), if @file is an enclave, allocate some
data
structure to store for every page, the WRITTEN flag as described
above.
WRITTEN is cleared initially for all pages.

This would effectively require *every* LSM to duplicate the SGX driver's
functionality, e.g. track per-page metadata, implement locking to
prevent races between multiple mm structs, etc...

Architecturally we shouldn't dictate how LSM makes decisions. ALLOW_* are no difference than PROCESS__* or FILE__* flags, which are just artifacts to assist particular LSMs in decision making. They are never considered part of the LSM interface, even if other LSMs than SELinux may adopt the same/similar approach.

If code duplication is what you are worrying about, you can put them in a library, or implement/export them in some new file (maybe security/enclave.c?) as utility functions. But spilling them into user mode is what I think is unacceptable.


Open: Given a file of type struct file *, how to tell if it is an
enclave (i.e. /dev/sgx/enclave)?
* In hook security_mmap_file(), if @file is an enclave, make sure
@prot can
only be PROT_NONE. This is to force all protection changes to go
through
security_file_mprotect().
* In the newly introduced hook security_enclave_load(), set WRITTEN
for pages
that are requested PROT_WRITE.

How would an LSM associate a page with a specific enclave? vma->vm_file
will point always point at /dev/sgx/enclave. vma->vm_mm is useless
because we're allowing multiple processes to map a single enclave, not
to mention that by mm would require holding a reference to the mm.

Each open("/dev/sgx/enclave") syscall creates a *new* instance of struct file to uniquely identify one enclave instance. What I mean is @vma->vm_file, not @vma->vm_file->f_path or @vma->vm_file->f_inode.


* In hook security_file_mprotect(), if @vma->vm_file is an enclave,
look up
and use WRITTEN flags for all pages within @vma, along with other
global
flags (e.g. PROCESS__EXECMEM/FILE__EXECMOD in the case of SELinux)
to decide
on allowing/rejecting @prot.

vma->vm_file will always be /dev/sgx/enclave at this point, which means
LSMs don't have the necessary anchor back to the source file, e.g. to
enforce FILE__EXECUTE. The noexec file system case is also unaddressed.

vma->vm_file identifies an enclave instance uniquely. FILE__EXECUTE is checked by security_enclave_load() using @source_vma->vm_file. Once a page has been EADD'ed, whether to allow RW->RX depends on .sigstruct file (more precisely, the file backing SIGSTRUCT), whose FILE__* attributes could be cached in vma->vm_file->f_security by security_enclave_init().

The RFC series seemed to dispense with the use of the sigstruct file and just used the source file throughout IIUC. That allowed for reuse of FILE__* permissions without ambiguity rather than introducing separate ENCLAVE__* permissions or using /dev/sgx/enclave inode as the target of all checks.

Regardless, IIUC, your approach requires that we always check FILE__EXECMOD, and FILE__EXECUTE up front during security_enclave_load() irrespective of prot so that we can save the result in the f_security for later use by the mprotect hook. This may generate many spurious audit messages for cases where PROT_EXEC will never be requested, and users will be prone to just always allowing it since they cannot tell when it was actually needed.

The noexec case should be addressed in IOC_ADD_PAGES by testing @source_vma->vm_flags & VM_MAYEXEC.


* In hook security_file_free(), if @file is an enclave, free storage
allocated for WRITTEN flags.