Re: [RFC PATCH v1 2/3] LSM/x86/sgx: Implement SGX specific hooks in SELinux

From: Stephen Smalley
Date: Thu Jun 13 2019 - 14:05:21 EST

On 6/11/19 6:55 PM, Xing, Cedric wrote:
From: linux-sgx-owner@xxxxxxxxxxxxxxx [mailto:linux-sgx-
owner@xxxxxxxxxxxxxxx] On Behalf Of Stephen Smalley
Sent: Tuesday, June 11, 2019 6:40 AM

+ rc = sgxsec_mprotect(vma, prot);
+ if (rc <= 0)
+ return rc;

Why are you skipping the file_map_prot_check() call when rc == 0?
What would SELinux check if you didn't do so -
FILE__READ|FILE__WRITE|FILE__EXECUTE to /dev/sgx/enclave? Is it a
problem to let SELinux proceed with that check?

We can continue the check. But in practice, all FILE__{READ|WRITE|EXECUTE} are needed for every enclave, then what's the point of checking them? FILE__EXECMOD may be the only flag that has a meaning, but it's kind of redundant because sigstruct file was checked against that already.

I don't believe FILE__EXECMOD will be checked since it is a shared file mapping. We'll check at least FILE__READ and FILE__WRITE anyway upon open(), and possibly FILE__EXECUTE upon mmap() unless that is never PROT_EXEC. We want the policy to accurately reflect the operations of the system, even when an operation "must" be allowed, and even here this only needs to be allowed to processes authorized as enclave loaders, not to all processes.

I don't think there are other examples where we skip a SELinux check like this. If we were to do so here, we would at least need a comment explaining that it was intentional and why. The risk would be that future checking added into file_map_prot_check() would be unwittingly bypassed for these mappings. A warning there would also be advisable if we skip it for these mappings.

+static int selinux_enclave_load(struct file *encl, unsigned long addr,
+ unsigned long size, unsigned long prot,
+ struct vm_area_struct *source)
+ if (source) {
+ /**
+ * Adding page from source => EADD request
+ */
+ int rc = selinux_file_mprotect(source, prot, prot);
+ if (rc)
+ return rc;
+ if (!(prot & VM_EXEC) &&
+ selinux_file_mprotect(source, VM_EXEC, VM_EXEC))

I wouldn't conflate VM_EXEC with PROT_EXEC even if they happen to be
defined with the same values currently. Elsewhere the kernel appears to
explicitly translate them ala calc_vm_prot_bits().

Thanks! I'd change them to PROT_EXEC in the next version.

Also, this will mean that we will always perform an execute check on all
sources, thereby triggering audit denial messages for any EADD sources
that are only intended to be data. Depending on the source, this could
where users often just run any denials they see through audit2allow,
they'll end up always allowing them all. How can they tell whether it
was needed? It would be preferable if we could only trigger execute
checks when there is some probability that execute will be requested in
the future. Alternatives would be to silence the audit of these
permission checks always via use of _noaudit() interfaces or to silence
audit of these permissions via dontaudit rules in policy, but the latter
would hide all denials of the permission by the process, not just those
triggered from security_enclave_load(). And if we silence them, then we
won't see them even if they were needed.

*_noaudit() is exactly what I wanted. But I couldn't find selinux_file_mprotect_noaudit()/file_has_perm_noaudit(), and I'm reluctant to duplicate code. Any suggestions?

I would have no objection to adding _noaudit() variants of these, either duplicating code (if sufficiently small/simple) or creating a common helper with a bool audit flag that gets used for both. But the larger issue would be to resolve how to ultimately ensure that a denial is audited later if the denied permission is actually requested and blocked via sgxsec_mprotect().

+ prot = 0;
+ else {
+ prot = SGX__EXECUTE;
+ if (source->vm_file &&
+ !file_has_perm(current_cred(), source->vm_file,
+ prot |= SGX__EXECMOD;

Similarly, this means that we will always perform a FILE__EXECMOD check
on all executable sources, triggering audit denial messages for any EADD
source that is executable but to which EXECMOD is not allowed, and again
the most common pattern will be that users will add EXECMOD to all
executable sources to avoid this.

+ }
+ return sgxsec_eadd(encl, addr, size, prot);
+ } else {
+ /**
+ * Adding page from NULL => EAUG request
+ */
+ return sgxsec_eaug(encl, addr, size, prot);
+ }
+static int selinux_enclave_init(struct file *encl,
+ const struct sgx_sigstruct *sigstruct,
+ struct vm_area_struct *vma)
+ int rc = 0;
+ if (!vma)
+ rc = -EINVAL;

Is it ever valid to call this hook with a NULL vma? If not, this should
be handled/prevented by the caller. If so, I'd just return -EINVAL
immediately here.

vma shall never be NULL. I'll update it in the next version.

+ if (!rc && !(vma->vm_flags & VM_EXEC))
+ rc = selinux_file_mprotect(vma, VM_EXEC, VM_EXEC);

I had thought we were trying to avoid overloading FILE__EXECUTE (or
whatever gets checked here, e.g. could be PROCESS__EXECMEM or
FILE__EXECMOD) on the sigstruct file, since the caller isn't truly
executing code from it.

Agreed. Another problem with FILE__EXECMOD on the sigstruct file is that user code would then be allowed to modify SIGSTRUCT at will, which effectively wipes out the protection provided by FILE__EXECUTE.

I'd define new ENCLAVE__* permissions, including an up-front
ENCLAVE__INIT permission that governs whether the sigstruct file can be
used at all irrespective of memory protections.


Then you can also have ENCLAVE__EXECUTE, ENCLAVE__EXECMEM,
ENCLAVE__EXECMOD for the execute-related checks. Or you can use the
/dev/sgx/enclave inode as the target for the execute checks and just
reuse the file permissions there.

Now we've got 2 options - 1) New ENCLAVE__* flags on sigstruct file or 2) FILE__* on /dev/sgx/enclave. Which one do you think makes more sense?

ENCLAVE__EXECMEM seems to offer finer granularity (than PROCESS__EXECMEM) but I wonder if it'd have any real use in practice.

Defining a separate ENCLAVE__EXECUTE and using it here for the sigstruct file would avoid any ambiguity with the FILE__EXECUTE check to the /dev/sgx/enclave inode that might occur upon mmap() or mprotect(). A separate ENCLAVE__EXECMEM would enable allowing WX within the enclave while denying it in the host application or vice versa, which could be a good thing for security, particularly if SGX2 largely ends up always wanting WX.

+int sgxsec_mprotect(struct vm_area_struct *vma, size_t prot) {
+ struct enclave_sec *esec;
+ int rc;
+ if (!vma->vm_file || !(esec = __esec(selinux_file(vma->vm_file))))
+ /* Positive return value indicates non-enclave VMA */
+ return 1;
+ }
+ down_read(&esec->sem);
+ rc = enclave_mprotect(&esec->regions, vma->vm_start, vma->vm_end,

Why is it safe for this to only use down_read()? enclave_mprotect() can
call enclave_prot_set_cb() which modifies the list?

Probably because it was too late at night when I wrote this line:-( Good catch!

I haven't looked at this code closely, but it feels like a lot of SGX-
specific logic embedded into SELinux that will have to be repeated or
reused for every security module. Does SGX not track this state itself?

I can tell you have looked quite closely, and I truly think you for your time!

You are right that there are SGX specific stuff. More precisely, SGX enclaves don't have access to anything except memory, so there are only 3 questions that need to be answered for each enclave page: 1) whether X is allowed; 2) whether W->X is allowed and 3 whether WX is allowed. This proposal tries to cache the answers to those questions upon creation of each enclave page, meaning it involves a) figuring out the answers and b) "remember" them for every page. #b is generic, mostly captured in intel_sgx.c, and could be shared among all LSM modules; while #a is SELinux specific. I could move intel_sgx.c up one level in the directory hierarchy if that's what you'd suggest.

By "SGX", did you mean the SGX subsystem being upstreamed? It doesnât track that state. In practice, there's no way for SGX to track it because there's no vm_ops->may_mprotect() callback. It doesn't follow the philosophy of Linux either, as mprotect() doesn't track it for regular memory. And it doesn't have a use without LSM, so I believe it makes more sense to track it inside LSM.

Yes, the SGX driver/subsystem. I had the impression from Sean that it does track this kind of per-page state already in some manner, but possibly he means it does under a given proposal and not in the current driver.

Even the #b remembering might end up being SELinux-specific if we also have to remember the original inputs used to compute the answer so that we can audit that information when access is denied later upon mprotect(). At the least we'd need it to save some opaque data and pass it to a callback into SELinux to perform that auditing.