[PATCH v5 next 4/5] modules:capabilities: add a per-task modules auto-load mode
From: Djalal Harouni
Date: Mon Nov 27 2017 - 12:20:10 EST
Previous patches added the global sysctl "modules_autoload_mode". This patch
make it possible to support process trees, containers, and sandboxes by
providing an inherited per-task "modules_autoload_mode" flag that cannot be
re-enabled once disabled. This allows to improve automatic module loading
without affecting the rest of the system.
Why we need this ?
Usually a request to a kernel feature that is implemented by a module
that is not loaded may trigger automatic module loading feature,
allowing to transparently satisfy userspace, and provide numeours
features as they are needed. In this case an implicit kernel module load
operation happens.
In most cases to load or unload a kernel module, an explicit operation
happens where programs are required to have CAP_SYS_MODULE capability to
perform so. However, in general with implicit module loading, no
capabilities are required as automatic module loading is one of the most
important and transparent operations of Linux.
Recent vulnerabilities showed that automatic module loading can be
abused in order to expose more bugs. Some of these vulnerabilities are:
* DCCP use after free CVE-2017-6074 [1] [2]
Unprivileged to local root PoC.
* XFRM framework CVE-2017-7184 [3]
As advertised it seems it was used to break Ubuntu at a security
contest.
* n_hldc CVE-2017-2636 [4] [5]
Local privilege escalation.
* L2TPv3 CVE-2016-10200
Currently most of Linux code is in a form of modules, and not all
modules are written or maintained in the same way. In a container or
sandbox world, apps can be moved from one context to another or from
one Linux system to another one, the ability to restrict some of these
apps to load extra kernel modules will prevent exposing some kernel
interfaces that have not been updated withing such systems.
The DCCP vulnerability CVE-2017-6074 that can be triggered by
unprivileged, or CVE-2017-7184 in the XFRM framework are some recent
real examples. CVE-2017-7184 was used to break Ubuntu at a security
contest. Ubuntu is more of desktop distro, using a global switch to
disable automatic module loading will harm users. Actually this design
will always end up being ignored by such kind of systems that need to
offer a competitive and interactive solution for their users.
>From this and from observing how apps are being run, this patch
introduces a per-task "modules_autoload_mode" to restrict automatic
module loading. This offers the following advantages:
1) Allows to abstract in userspace as something like:
DenyNewFeatures=yes
2) Automatic module loading is still available to the rest of the
system.
2) It is easy to use in containers and sandboxes. DCCP example could
have been used to escape containers. The XFRM framework CVE-2017-7184
needs CAP_NET_ADMIN, but attackers may start to target CAP_NET_ADMIN,
a per-task flag will make it harder.
3) Suitable for desktop and more interactive Linux systems.
4) Will allow in future to implement a per user policy.
The user database format is old and not extensible, as discussed maybe
with a modern format we may achieve the following:
User=djalal
DenyNewFeatures=no
Which means that interactive user will be allowed to load extra
Linux features. Others, volatile accounts or guests can be easily
blocked from doing so.
5) CAP_NET_ADMIN is useful, it handles lot of operations, at same time it
started to look more like CAP_SYS_ADMIN which is overloaded. We need
CAP_NET_ADMIN, containers need it, but at same time maybe we do not
want programs running with it to load 'netdev-%s' modules. Having an
extra per-task flag allow to discharge CAP_NET_ADMIN and other
capabilities, it is clearly targeted to automatic module loading
operations and from a higher view to 'load new kernel features schema'.
Usage:
------
To set the per-task "modules_autoload_mode":
prctl(PR_SET_MODULES_AUTOLOAD_MODE, mode, 0, 0, 0);
When a module auto-load request is triggered by current task, then the
operation has first to satisfy the per-task access mode before attempting
to implicitly load the module. Once set, this setting is inherited across
fork, clone and execve.
Prior to use, the task must call prctl(PR_SET_NO_NEW_PRIVS, 1) or run with
CAP_SYS_ADMIN privileges in its namespace. If these are not true, -EACCES
will be returned. This requirement ensures that unprivileged programs cannot
affect the behaviour or surprise privileged children.
The per-task "modules_autoload_mode" supports the following values:
0 There are no restrictions, usually the default unless set
by parent.
1 The task must have CAP_SYS_MODULE to be able to trigger a
module auto-load operation, or CAP_NET_ADMIN for modules with
a 'netdev-%s' alias.
2 Automatic modules loading is disabled for the current task.
The mode may only be increased, never decreased, thus ensuring that once
applied, processes can never relax their setting. This make it easy for
developers and users to handle.
Note that even if the per-task "modules_autoload_mode" allows to auto-load
the corresponding modules, automatic module loading may still fail due to
the global sysctl "modules_autoload_mode". For more details please see
Documentation/sysctl/kernel.txt, section "modules_autoload_mode".
When a request to a kernel module is denied, the module name with the
corresponding process name and its pid are logged. Administrators can use
such information to explicitly load the appropriate modules.
Testing per-task or per container setup
---------------------------------------
The following tool can be used to test the feature:
https://gist.githubusercontent.com/tixxdz/cf567e4275714199a32c4a80de4ea63a/raw/13e52ea0ee65772871bcf10fb6c94fedd349f5c1/pr_modules_autoload_mode_test.c
Example 1)
Before patch:
$ lsmod | grep ipip -
$ sudo ip tunnel add mytun mode ipip remote 10.0.2.100 local 10.0.2.15 ttl 255
$ lsmod | grep ipip -
ipip 16384 0
tunnel4 16384 1 ipip
ip_tunnel 28672 1 ipip
$ grep Modules /proc/self/status
ModulesAutoloadMode: 0
After patch:
Set task "modules_autoload_mode" to disabled.
$ lsmod | grep ipip -
$ grep Modules /proc/self/status
ModulesAutoloadMode: 0
$ su - root
# ./pr_modules_autoload_mode_test 2
# grep Modules /proc/self/status
ModulesAutoloadMode: 2
# ip tunnel add mytun mode ipip remote 10.0.2.100 local 10.0.2.15 ttl 255
add tunnel "tunl0" failed: No such device
...
[ 634.954652] module: automatic module loading of netdev-tunl0 by "ip"[1560] was denied
[ 634.955775] module: automatic module loading of tunl0 by "ip"[1560] was denied
...
Example 2)
Sample with XFRM tunnel mode.
Before patch:
$ lsmod | grep xfrm -
$ grep Modules /proc/self/status
ModulesAutoloadMode: 0
$ sudo ip xfrm state add src 10.0.2.100 dst 10.0.1.100 proto esp spi $id1 \
> reqid $id2 mode tunnel auth "hmac(sha256)" $key1 enc "cbc(aes)" $key2
$ lsmod | grep xfrm
xfrm4_mode_tunnel 16384 2
After patch:
Set task "modules_autoload_mode" to disabled.
$ lsmod | grep xfrm -
$ grep Modules /proc/self/status
ModulesAutoloadMode: 0
$ su - root
# ./pr_modules_autoload_mode_test 2
# grep Modules /proc/self/status
ModulesAutoloadMode: 2
# ip xfrm state add src 10.0.2.100 dst 10.0.1.100 proto esp spi $id1 \
> reqid $id2 mode tunnel auth "hmac(sha256)" $key1 enc "cbc(aes)" $key2
RTNETLINK answers: Protocol not supported
...
[ 3458.139490] module: automatic module loading of xfrm-mode-2-1 by "ip"[1506] was denied
...
Example 3)
Here we use DCCP as an example since the public PoC was against it.
DCCP use after free CVE-2017-6074 (unprivileged to local root):
The code path can be triggered by unprivileged, using the trigger.c
program for DCCP use after free [2] and that was fixed by
commit 5edabca9d4cff7f "dccp: fix freeing skb too early for IPV6_RECVPKTINFO".
Before patch:
$ lsmod | grep dccp
$ strace ./dccp_trigger
...
socket(AF_INET6, SOCK_DCCP, IPPROTO_IP) = 3
...
$ lsmod | grep dccp
dccp_ipv6 24576 5
dccp_ipv4 24576 5 dccp_ipv6
dccp 102400 2 dccp_ipv6,dccp_ipv4
$ grep Modules /proc/self/status
ModulesAutoloadMode: 0
After patch:
Set task "modules_autoload_mode" to 1, privileged mode.
$ lsmod | grep dccp
$ ./pr_set_no_new_privs
$ grep NoNewPrivs /proc/self/status
NoNewPrivs: 1
$ ./pr_modules_autoload_mode_test 1
$ grep Modules /proc/self/status
ModulesAutoloadMode: 1
$ strace ./dccp_trigger
...
socket(AF_INET6, SOCK_DCCP, IPPROTO_IP) = -1 ESOCKTNOSUPPORT (Socket type not supported)
...
$ lsmod | grep dccp
$ dmesg
...
[ 4662.171994] module: automatic module loading of net-pf-10-proto-0-type-6 by "dccp_trigger"[1759] was denied
[ 4662.177284] module: automatic module loading of net-pf-10-proto-0 by "dccp_trigger"[1759] was denied
[ 4662.180181] module: automatic module loading of net-pf-10-proto-0-type-6 by "dccp_trigger"[1759] was denied
[ 4662.181709] module: automatic module loading of net-pf-10-proto-0 by "dccp_trigger"[1759] was denied
Now task "modules_autoload_mode" to 2, disabled mode.
$ lsmod | grep dccp
$ grep Modules /proc/self/status
ModulesAutoloadMode: 0
$ su - root
# ./pr_modules_autoload_mode_test 2
# grep Modules /proc/self/status
ModulesAutoloadMode: 2
# strace ./dccp_trigger
...
socket(AF_INET6, SOCK_DCCP, IPPROTO_IP) = -1 ESOCKTNOSUPPORT (Socket type not supported)
...
...
[ 5154.218740] module: automatic module loading of net-pf-10-proto-0-type-6 by "dccp_trigger"[1873] was denied
[ 5154.219828] module: automatic module loading of net-pf-10-proto-0 by "dccp_trigger"[1873] was denied
[ 5154.221814] module: automatic module loading of net-pf-10-proto-0-type-6 by "dccp_trigger"[1873] was denied
[ 5154.222731] module: automatic module loading of net-pf-10-proto-0 by "dccp_trigger"[1873] was denied
As showed, this blocks automatic module loading per-task. This allows to
provide a usable system, where only some sandboxed apps or containers will be
restricted to trigger automatic module loading, other parts of the
system can continue to use the feature as it is which is the case of the
desktop and userfriendly machines.
[1] http://www.openwall.com/lists/oss-security/2017/02/22/3
[2] https://github.com/xairy/kernel-exploits/tree/master/CVE-2017-6074
[3] http://www.openwall.com/lists/oss-security/2017/03/29/2
[4] http://www.openwall.com/lists/oss-security/2017/03/07/6
[5] https://a13xp0p0v.github.io/2017/03/24/CVE-2017-2636.html
Cc: Ben Hutchings <ben.hutchings@xxxxxxxxxxxxxxx>
Cc: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Cc: James Morris <james.l.morris@xxxxxxxxxx>
Cc: Serge Hallyn <serge@xxxxxxxxxx>
Cc: Solar Designer <solar@xxxxxxxxxxxx>
Cc: Andy Lutomirski <luto@xxxxxxxxxx>
Cc: Kees Cook <keescook@xxxxxxxxxxxx>
Signed-off-by: Djalal Harouni <tixxdz@xxxxxxxxx>
---
Documentation/filesystems/proc.txt | 3 +
Documentation/userspace-api/index.rst | 1 +
.../userspace-api/modules_autoload_mode.rst | 116 +++++++++++++++++++++
fs/proc/array.c | 6 ++
include/linux/init_task.h | 8 ++
include/linux/module.h | 20 ++++
include/linux/sched.h | 5 +
include/uapi/linux/prctl.h | 8 ++
kernel/module.c | 83 ++++++++++++---
security/commoncap.c | 36 +++++++
10 files changed, 270 insertions(+), 16 deletions(-)
create mode 100644 Documentation/userspace-api/modules_autoload_mode.rst
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 2a84bb3..1974cb6 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -195,6 +195,7 @@ read the file /proc/PID/status:
CapBnd: ffffffffffffffff
NoNewPrivs: 0
Seccomp: 0
+ ModulesAutoloadMode: 0
voluntary_ctxt_switches: 0
nonvoluntary_ctxt_switches: 1
@@ -269,6 +270,8 @@ Table 1-2: Contents of the status files (as of 4.8)
CapBnd bitmap of capabilities bounding set
NoNewPrivs no_new_privs, like prctl(PR_GET_NO_NEW_PRIV, ...)
Seccomp seccomp mode, like prctl(PR_GET_SECCOMP, ...)
+ ModulesAutoloadMode modules auto-load mode, like
+ prctl(PR_GET_MODULES_AUTOLOAD_MODE, ...)
Cpus_allowed mask of CPUs on which this process may run
Cpus_allowed_list Same as previous, but in "list format"
Mems_allowed mask of memory nodes allowed to this process
diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
index 7b2eb1b..bfd51b7 100644
--- a/Documentation/userspace-api/index.rst
+++ b/Documentation/userspace-api/index.rst
@@ -17,6 +17,7 @@ place where this information is gathered.
:maxdepth: 2
no_new_privs
+ modules_autoload_mode
seccomp_filter
unshare
diff --git a/Documentation/userspace-api/modules_autoload_mode.rst b/Documentation/userspace-api/modules_autoload_mode.rst
new file mode 100644
index 0000000..1153c35
--- /dev/null
+++ b/Documentation/userspace-api/modules_autoload_mode.rst
@@ -0,0 +1,116 @@
+======================================
+Per-task module auto-load restrictions
+======================================
+
+
+Introduction
+============
+
+Usually a request to a kernel feature that is implemented by a module
+that is not loaded may trigger automatic module loading feature, allowing
+to transparently satisfy userspace, and provide numerous other features
+as they are needed. In this case an implicit kernel module load
+operation happens.
+
+In most cases to load or unload a kernel module, an explicit operation
+happens where programs are required to have ``CAP_SYS_MODULE`` capability
+to perform so. However, with implicit module loading, no capabilities are
+required, or only ``CAP_NET_ADMIN`` in rare cases where the module has the
+'netdev-%s' alias. Historically this was always the case as automatic
+module loading is one of the most important and transparent operations
+of Linux, users expect that their programs just work, yet, recent cases
+showed that this can be abused by unprivileged users or attackers to load
+modules that were not updated, or modules that contain bugs and
+vulnerabilities.
+
+Currently most of Linux code is in a form of modules, hence, allowing to
+control automatic module loading in some cases is as important as the
+operation itself, especially in the context where Linux is used in
+different appliances.
+
+Restricting automatic module loading allows administratros to have the
+appropriate time to update or deny module autoloading in advance. In a
+container or sandbox world where apps can be moved from one context to
+another, the ability to restrict some containers or apps to load extra
+kernel modules will prevent exposing some kernel interfaces that may not
+receive the same care as some other parts of the core. The DCCP vulnerability
+CVE-2017-6074 that can be triggered by unprivileged, or CVE-2017-7184
+in the XFRM framework are some real examples where users or programs are
+able to expose such kernel interfaces and escape their sandbox.
+
+The per-task ``modules_autoload_mode`` allow to restrict automatic module
+loading per task, preventing the kernel from exposing more of its
+interface. This is particularly useful for containers and sandboxes as
+noted above, they are restricted from affecting the rest of the system
+without affecting its functionality, automatic module loading is still
+available for others.
+
+
+Usage
+=====
+
+When the kernel is compiled with modules support ``CONFIG_MODULES``, then:
+
+``PR_SET_MODULES_AUTOLOAD_MODE``:
+ Set the current task ``modules_autoload_mode``. When a module
+ auto-load request is triggered by current task, then the
+ operation has first to satisfy the per-task access mode before
+ attempting to implicitly load the module. As an example,
+ automatic loading of modules that contain bugs or vulnerabilities
+ can be restricted and unprivileged users can no longer abuse such
+ interfaces. Once set, this setting is inherited across ``fork(2)``,
+ ``clone(2)`` and ``execve(2)``.
+
+ Prior to use, the task must call ``prctl(PR_SET_NO_NEW_PRIVS, 1)``
+ or run with ``CAP_SYS_ADMIN`` privileges in its namespace. If
+ these are not true, ``-EACCES`` will be returned. This requirement
+ ensures that unprivileged programs cannot affect the behaviour or
+ surprise privileged children.
+
+ Usage:
+ ``prctl(PR_SET_MODULES_AUTOLOAD_MODE, mode, 0, 0, 0);``
+
+ The 'mode' argument supports the following values:
+ 0 There are no restrictions, usually the default unless set
+ by parent.
+ 1 The task must have ``CAP_SYS_MODULE`` to be able to trigger a
+ module auto-load operation, or ``CAP_NET_ADMIN`` for modules
+ with a 'netdev-%s' alias.
+ 2 Automatic modules loading is disabled for the current task.
+
+ The mode may only be increased, never decreased, thus ensuring
+ that once applied, processes can never relax their setting.
+
+
+ Returned values:
+ 0 On success.
+ ``-EINVAL`` If 'mode' is not valid, or the operation is not
+ supported.
+ ``-EACCES`` If task does not have ``CAP_SYS_ADMIN`` in its namespace
+ or is not running with ``no_new_privs``.
+ ``-EPERM`` If 'mode' is less strict than current task
+ ``modules_autoload_mode``.
+
+
+ Note that even if the per-task ``modules_autoload_mode`` allows to
+ auto-load the corresponding modules, automatic module loading
+ may still fail due to the global sysctl ``modules_autoload_mode``.
+ The default mode of ``modules_autoload_mode`` is to always allow
+ automatic module loading. For more details, please see
+ Documentation/sysctl/kernel.txt, section "modules_autoload_mode".
+
+
+ When a request to a kernel module is denied, the module name with the
+ corresponding process name and its pid are logged. Administrators can
+ use such information to explicitly load the appropriate modules.
+
+
+``PR_GET_MODULES_AUTOLOAD_MODE``:
+ Return the current task ``modules_autoload_mode``.
+
+ Usage:
+ ``prctl(PR_GET_MODULES_AUTOLOAD_MODE, 0, 0, 0, 0);``
+
+ Returned values:
+ mode The task's ``modules_autoload_mode``
+ ``-ENOSYS`` If the kernel was compiled without ``CONFIG_MODULES``.
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 79375fc..57b6cc5 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -90,6 +90,7 @@
#include <linux/string_helpers.h>
#include <linux/user_namespace.h>
#include <linux/fs_struct.h>
+#include <linux/module.h>
#include <asm/pgtable.h>
#include <asm/processor.h>
@@ -343,10 +344,15 @@ static inline void task_cap(struct seq_file *m, struct task_struct *p)
static inline void task_seccomp(struct seq_file *m, struct task_struct *p)
{
+ int autoload = task_modules_autoload_mode(p);
+
seq_put_decimal_ull(m, "NoNewPrivs:\t", task_no_new_privs(p));
#ifdef CONFIG_SECCOMP
seq_put_decimal_ull(m, "\nSeccomp:\t", p->seccomp.mode);
#endif
+ if (autoload != -ENOSYS)
+ seq_put_decimal_ull(m, "\nModulesAutoloadMode:\t", autoload);
+
seq_putc(m, '\n');
}
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 6a53262..f564b41 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -153,6 +153,13 @@ extern struct cred init_cred;
# define INIT_CGROUP_SCHED(tsk)
#endif
+#ifdef CONFIG_MODULES
+# define INIT_MODULES_AUTOLOAD_MODE(tsk) \
+ .modules_autoload_mode = 0,
+#else
+# define INIT_MODULES_AUTOLOAD_MODE(tsk)
+#endif
+
#ifdef CONFIG_PERF_EVENTS
# define INIT_PERF_EVENTS(tsk) \
.perf_event_mutex = \
@@ -250,6 +257,7 @@ extern struct cred init_cred;
.tasks = LIST_HEAD_INIT(tsk.tasks), \
INIT_PUSHABLE_TASKS(tsk) \
INIT_CGROUP_SCHED(tsk) \
+ INIT_MODULES_AUTOLOAD_MODE(tsk) \
.ptraced = LIST_HEAD_INIT(tsk.ptraced), \
.ptrace_entry = LIST_HEAD_INIT(tsk.ptrace_entry), \
.real_parent = &tsk, \
diff --git a/include/linux/module.h b/include/linux/module.h
index c36aed8..1d742d3 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -13,6 +13,7 @@
#include <linux/kmod.h>
#include <linux/init.h>
#include <linux/elf.h>
+#include <linux/sched.h>
#include <linux/stringify.h>
#include <linux/kobject.h>
#include <linux/moduleparam.h>
@@ -510,6 +511,15 @@ bool is_module_text_address(unsigned long addr);
int may_autoload_module(char *kmod_name, int required_cap,
const char *kmod_prefix);
+/* Set 'modules_autoload_mode' of current task */
+int task_set_modules_autoload_mode(unsigned long value);
+
+/* Read task's 'modules_autoload_mode' */
+static inline int task_modules_autoload_mode(struct task_struct *task)
+{
+ return task->modules_autoload_mode;
+}
+
static inline bool within_module_core(unsigned long addr,
const struct module *mod)
{
@@ -662,6 +672,16 @@ static inline int may_autoload_module(char *kmod_name, int required_cap,
return -ENOSYS;
}
+static inline int task_set_modules_autoload_mode(unsigned long value)
+{
+ return -ENOSYS;
+}
+
+static inline int task_modules_autoload_mode(struct task_struct *task)
+{
+ return -ENOSYS;
+}
+
static inline struct module *__module_address(unsigned long addr)
{
return NULL;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index e5a2fbc..1b8cf78 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -658,6 +658,11 @@ struct task_struct {
struct restart_block restart_block;
+#ifdef CONFIG_MODULES
+ /* per-task modules auto-load mode */
+ unsigned modules_autoload_mode:2;
+#endif
+
pid_t pid;
pid_t tgid;
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 3165863..5baf9ae 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -211,4 +211,12 @@ struct prctl_mm_map {
#define PR_SET_PDEATHSIG_PROC 48
#define PR_GET_PDEATHSIG_PROC 49
+/*
+ * Control the per-task modules auto-load mode
+ *
+ * See Documentation/prctl/modules_autoload_mode.txt for more details.
+ */
+#define PR_SET_MODULES_AUTOLOAD_MODE 50
+#define PR_GET_MODULES_AUTOLOAD_MODE 51
+
#endif /* _LINUX_PRCTL_H */
diff --git a/kernel/module.c b/kernel/module.c
index a7205fb..5c24ac4b 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -4345,6 +4345,7 @@ EXPORT_SYMBOL_GPL(__module_text_address);
/**
* may_autoload_module - Determine whether a module auto-load operation
* is permitted
+ *
* @kmod_name: The module name
* @required_cap: if positive, may allow to auto-load the module if this
* capability is set
@@ -4362,47 +4363,51 @@ EXPORT_SYMBOL_GPL(__module_text_address);
* loading.
*
* However even if the caller has the required capability, the operation can
- * still be denied due to the global "modules_autoload_mode" sysctl mode. Unless
- * set by enduser, the operation is always allowed which is the default.
+ * still be denied due to the per-task "modules_autoload_mode" mode and the
+ * global "modules_autoload_mode" sysctl one. Unless set by enduser, the
+ * operation is always allowed which is the default.
*
* The permission check is performed in this order:
- * 1) If the global sysctl "modules_autoload_mode" is set to 'disabled', then
- * operation is denied.
+ * 1) We calculate the strict mode of both:
+ * per-task 'modules_autoload_mode' and global sysctl 'modules_autoload_mode'
+ *
+ * We follow up with the result mode as "modules_autoload_mode":
*
- * 2) If the global sysctl "modules_autoload_mode" is set to 'privileged', then:
+ * 2) If "modules_autoload_mode" is set to 'disabled', then operation is denied.
*
- * 2.1) If "@required_cap" is positive and "@kmod_prefix" is set, then
+ * 3) If "modules_autoload_mode" is set to 'privileged', then:
+ *
+ * 3.1) If "@required_cap" is positive and "@kmod_prefix" is set, then
* if the caller has the capability, the operation is allowed.
*
- * 2.2) If "@required_cap" is positive and "@kmod_prefix" is NULL, then we
+ * 3.2) If "@required_cap" is positive and "@kmod_prefix" is NULL, then we
* fallback to check if caller has CAP_SYS_MODULE, if so, operation is
* allowed.
*
- * 2.3) If caller passes "@required_cap" as a negative then we fallback to
+ * 3.3) If caller passes "@required_cap" as a negative then we fallback to
* check if caller has CAP_SYS_MODULE, if so, operation is allowed.
*
* We require capabilities to autoload modules here, and CAP_SYS_MODULE here is
* the default.
*
- * 2.4) Otherwise operation is denied.
+ * 3.4) Otherwise operation is denied.
*
- * 3) If the global sysctl "modules_autoload_mode" is set to 'allowed' which is
- * the default, then:
+ * 4) If "modules_autoload_mode" is set to 'allowed' which is the default, then:
*
- * 3.1) If "@required_cap" is positive and "@kmod_prefix" is set, we check if
+ * 4.1) If "@required_cap" is positive and "@kmod_prefix" is set, we check if
* caller has the capability, if so, operation is allowed.
* In this case the calling subsystem requires the capability to be set before
* allowing modules autoload operations and we have to honor that.
*
- * 3.2) If "@required_cap" is positive and "@kmod_prefix" is NULL, then we
+ * 4.2) If "@required_cap" is positive and "@kmod_prefix" is NULL, then we
* fallback to check if caller has CAP_SYS_MODULE, if so, operation is
* allowed.
*
- * 3.3) If caller passes "@required_cap" as a negative then operation is
+ * 4.3) If caller passes "@required_cap" as a negative then operation is
* allowed. This is the most common case as it is used now by
* request_module() function.
*
- * 3.4) Otherwise operation is denied.
+ * 4.4) Otherwise operation is denied.
*
* Returns 0 if the module request is allowed or -EPERM if not.
*/
@@ -4410,7 +4415,8 @@ int may_autoload_module(char *kmod_name, int required_cap,
const char *kmod_prefix)
{
int module_require_cap = CAP_SYS_MODULE;
- unsigned int autoload = modules_autoload_mode;
+ unsigned int autoload = max_t(unsigned int, modules_autoload_mode,
+ current->modules_autoload_mode);
/* Short-cut for most use cases where kmod auto-loading is allowed */
if (autoload == MODULES_AUTOLOAD_ALLOWED && required_cap < 0)
@@ -4442,6 +4448,51 @@ int may_autoload_module(char *kmod_name, int required_cap,
return -EPERM;
}
+/**
+ * task_set_modules_autoload_mode - Set per-task modules auto-load mode
+ * @value: Value to set "modules_autoload_mode" of current task
+ *
+ * Set current task "modules_autoload_mode". The task has to have
+ * CAP_SYS_ADMIN in its namespace or be running with no_new_privs. This
+ * avoids scenarios where unprivileged tasks can affect the behaviour of
+ * privilged children by restricting module or kernel features.
+ *
+ * The task's "modules_autoload_mode" may only be increased, never decreased.
+ *
+ * Returns 0 on success, -EINVAL if @value is not valid, -EACCES if task does
+ * not have CAP_SYS_ADMIN in its namespace or is not running with no_new_privs,
+ * and finally -EPERM if @value is less strict than current task
+ * "modules_autoload_mode".
+ *
+ */
+int task_set_modules_autoload_mode(unsigned long value)
+{
+ if (value > MODULES_AUTOLOAD_DISABLED)
+ return -EINVAL;
+
+ /*
+ * To set task "modules_autoload_mode" requires that the task has
+ * CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
+ * This avoids scenarios where unprivileged tasks can affect the
+ * behaviour of privileged children by restricting module features.
+ */
+ if (!task_no_new_privs(current) &&
+ security_capable_noaudit(current_cred(), current_user_ns(),
+ CAP_SYS_ADMIN) != 0)
+ return -EACCES;
+
+ /*
+ * The "modules_autoload_mode" may only be increased, never decreased,
+ * ensuring that once applied, processes can never relax their settings.
+ */
+ if (current->modules_autoload_mode > value)
+ return -EPERM;
+ else if (current->modules_autoload_mode < value)
+ current->modules_autoload_mode = value;
+
+ return 0;
+}
+
/* Don't grab lock, we're oopsing. */
void print_modules(void)
{
diff --git a/security/commoncap.c b/security/commoncap.c
index 236e573..67a235c 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -1157,6 +1157,36 @@ static int cap_prctl_drop(unsigned long cap)
return commit_creds(new);
}
+/*
+ * Implement PR_SET_MODULES_AUTOLOAD_MODE.
+ *
+ * Returns 0 on success, -ve on error.
+ */
+static int pr_set_modules_autoload_mode(unsigned long arg2, unsigned long arg3,
+ unsigned long arg4, unsigned long arg5)
+{
+ if (arg3 || arg4 || arg5)
+ return -EINVAL;
+
+ return task_set_modules_autoload_mode(arg2);
+}
+
+/*
+ * Implement PR_GET_MODULES_AUTOLOAD_MODE.
+ *
+ * Return current task "modules_autoload_mode", -ve on error.
+ */
+static inline int pr_get_modules_autoload_mode(unsigned long arg2,
+ unsigned long arg3,
+ unsigned long arg4,
+ unsigned long arg5)
+{
+ if (arg2 || arg3 || arg4 || arg5)
+ return -EINVAL;
+
+ return task_modules_autoload_mode(current);
+}
+
/**
* cap_task_prctl - Implement process control functions for this security module
* @option: The process control function requested
@@ -1287,6 +1317,12 @@ int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
return commit_creds(new);
}
+ case PR_SET_MODULES_AUTOLOAD_MODE:
+ return pr_set_modules_autoload_mode(arg2, arg3, arg4, arg5);
+
+ case PR_GET_MODULES_AUTOLOAD_MODE:
+ return pr_get_modules_autoload_mode(arg2, arg3, arg4, arg5);
+
default:
/* No functionality available - continue with default */
return -ENOSYS;
--
2.7.4