[PATCH] Document how capability bits work
From: Andy Lutomirski
Date: Fri Dec 07 2012 - 13:20:59 EST
Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
---
Documentation/security/capabilities.txt | 161 ++++++++++++++++++++++++++++++++
1 file changed, 161 insertions(+)
create mode 100644 Documentation/security/capabilities.txt
diff --git a/Documentation/security/capabilities.txt b/Documentation/security/capabilities.txt
new file mode 100644
index 0000000..dc7bc34
--- /dev/null
+++ b/Documentation/security/capabilities.txt
@@ -0,0 +1,161 @@
+ Linux capabilities
+
+
+==== What are capabilities ====
+
+Various system calls check for appropriate privileges. For example, a program
+may bypass normal file permission checking if it has the CAP_DAC_OVERRIDE
+capability. There are a lot of capabilities; the complete list is in
+include/uapi/linux/capability.h.
+
+When reading this description, do not assume anything about the word
+"inheritable". It probably does not do what you expect.
+
+Every task has the following pieces of capability-related state.
+
+ * Four capability bit masks:
+ * The effective set (pE). Privileged operations check this set.
+ * The permitted set (pP). Tasks may set these bits in pE.
+ * The inheritable set (pI). This set is complicated.
+ * The bounding set (pB). This partially limits new permitted capabilities.
+
+ * Secure bits. Each bit has a corresponding "lock" bit.
+ * SECURE_NONROOT: Makes uid==0 and euid==0 less special at exec time.
+ * SECURE_KEEP_CAPS: Prevents setresuid() from removing permitted caps.
+ * SECURE_NO_SETUID_FIXUP: Makes setresuid() entirely nonmagical.
+
+ * no_new_privs: See Documentation/prctl/no_new_privs.txt
+
+There is one invariant: pE â pP.
+
+In addition, files can have capabilities. If a file has capabilities, it
+specifies two masks and one bit:
+ * fP: The permitted or forced set.
+ * fI: The inheritable set.
+ * fE (a single bit): Supposedly true for "legacy" programs.
+
+libcap's setcap tool pretends that fE is a bitmask. It's not.
+
+At the most basic level, only pE matters. All of the complexity is in how
+pE and the other masks can change. (This is a slight lie -- user namespaces
+change this.)
+
+==== System calls ====
+
+Capabilities and related state are affected by these syscalls:
+ * capset: Change capabilities directly.
+ * set[res]uid: Sometimes changes capabilities for legacy compatibility.
+ * prctl(PR_SET_KEEPCAPS): Used to twiddle SECURE_KEEP_CAPS.
+ * prctl(PR_SET_SECUREBITS): Used to twiddle securebits in general.
+ * prctl(PR_SET_NO_NEW_PRIVS): Used to set no_new_privs.
+ * prctl(PR_CAPBSET_DROP): Used to remove bits from pB.
+ * execve: Does all kinds of magic.
+
+==== capset ====
+
+capset changes pI, pP, and pE as requested, subject to:
+
+ - (CAP_SETPCAP â pE or euid is namespace owner) or pI' â pI | pP
+ - pI' â pI | pB
+ - pP' â pP
+ - pE' â pE
+
+In the event that pI â pB, the first two conditions simplify to pI' â pI | pP.
+
+==== set*uid ====
+
+After set[res]uid, if !SECURE_NO_SETUID_FIXUP, a fixup happens. This fixup
+does two things:
+
+ - If !SECURE_KEEP_CAPS and some old uid was 0 and no new uid is 0, then
+ pP and pE are cleared.
+ - If euid becomes zero, the pE = pP. Conversely, if euid becomes nonzero,
+ then pE' = 0. (Note that this is independent of SECURE_KEEP_CAPS.)
+
+setfsuid has similar logic to tweak the fs-related pE bits.
+
+==== prctl ====
+
+---- PR_SET_KEEPCAPS ----
+
+This changes SECURE_KEEP_CAPS as long as !SECURE_KEEP_CAPS_LOCKED.
+CAP_SETPCAP is not required.
+
+---- PR_SET_SECUREBITS ----
+
+This changes securebits, subject to:
+ - The caller must have CAP_SETPCAP.
+ - The *_LOCKED bits can be set but not cleared.
+ - A locked bit cannot be changed.
+
+Note that an unprivileged process can change SECURE_KEEP_CAPS via
+PR_SET_KEEPCAPS but not via PR_SET_SECUREBITS.
+
+---- PR_SET_NO_NEW_PRIVS ----
+
+Sets the no_new_privs bit. No privilege is required. It is impossible
+to clear the no_new_privs bit.
+
+---- PR_CAPBSET_DROP ----
+
+Clears a single bit of pB. Doing this requires CAP_SETPCAP. There is no
+way to set a cleared bit of pB.
+
+==== execve ====
+
+execve's behavior is rather complicated. It does this:
+
+Step 1: Load fI, fP, and fE. If the file has no capabilities (the xattr
+is malformed or absent), then set fI = 0, fP = 0, and fE = false. (In theory,
+fE is set on "legacy" binaries that don't know how to check their own
+capability sets.)
+
+Step 2: Apply the basic pP update rule:
+
+ pP' = (pB & fP) | (pI & fI)
+
+Step 3: If fE and pP â fP, then abort. (This prevents legacy binaries from
+malfunctioning dangerously if pB is missing important bits.)
+
+Step 4: Apply a fixup for root if !SECURE_NOROOT. The fixup is:
+
+ - If vfs caps were present, uid != 0, and euid == 0, then warn once per boot.
+ - Otherwise:
+ - If euid == 0 or uid == 0, then pP' = pB | pI.
+ - If euid == 0, then set fE = true. (This does not affect the check
+ in step 2.)
+
+Step 5: Apply no_new_privs
+
+If no_new_privs is set (or if new euid != old uid or new egit != old gid and
+an unprivileged ptracer is attached), then set euid = uid, egid = gid,
+and set pP' = pP' & pP. (Note: If CAP_SETUID is effective (in the old context)
+and no_new_privs is not set, then the euid and egid changes are skipped.)
+
+Step 6: Compute pE
+
+If fE, then pE' = pP'. Else pE' = 0.
+
+Step 7: Clear SECURE_KEEP_CAPS.
+
+This happens regardless of the setting of SECURE_KEEP_CAPS_LOCKED. Setting
+SECURE_KEEP_CAPS_LOCKED is therefore probably a mistake unless
+SECURE_NO_SETUID_FIXUP is set.
+
+
+In the absence of something like no_new_privs, then either
+
+pP' = (pB & fP) | (pI & fI) (the normal case)
+
+or
+
+pP' = pB | pI (if euid or uid == 0)
+
+The latter condition means that, if euid or uid is zero, then execve acts
+(in part) as though fP = fI = <all bits set>.
+
+
+The upshot: pI bits can result in actual (pP or pE) privilege if you exec a
+program that has that fI bit set *or* you have !issecure(SECURE_NOROOT) and
+(euid == 0 || uid == 0). (That latter case is possibly better understood
+as promoting pB bits to pP.)
--
1.7.11.7
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/