[PATCH v3 2/2] procfs: implement PROCFS_SET_GROUPS ioctl

From: Stas Sergeev
Date: Fri Nov 08 2024 - 15:41:34 EST


This patch implements PROCFS_SET_GROUPS ioctl for /proc/self/status
files. The fd must be transferred with SCM_RIGHTS to another process
in order for this ioctl to work. Below the process that opened the
file, is referred to as "opener" and the process that calls this ioctl(),
is called "setter".

The following checks are performed about the opener:
- Opener opened his own status, not someone else's.
- Opener process still exist, so his proc entry is still valid.
- The groups weren't changed since opening the file.
- Opener was capable of CAP_SYS_ADMIN when opening the file.

The following checks are performed about the setter:
- Setter is a different process (thread) than an opener.
- PFA_NO_NEW_PRIVS not set.
- Either the process must be setgid-capable, or all the below must be
true:
- euid==uid==suid i.e. no possibility to switch uid
- egid==gid==sgid i.e. no possibility to switch gid
- euid/egid of current process matches uid/gid of an opener process
had at the time of opening the file.

The setter can read from an fd to find out what group list is there
to decide whether or not he wants to use it. The opener can't trick
that by changing the groups in between, as then ioctl() detects that
the group list was ever changed and fails with -EPERM.
This is also the reason why the opener is required to "exist" (and
not exit) for the duration of entire operation: otherwise his status
file may not be readable by setter.

Why /proc/self/status?
Because this file carries the group list and other credentials
already. Its quite natural to add an ioctl to apply whatever is
already there. Client is therefore able to read the needed information
and validate it in any way he wants, before applying. It is guaranteed
that the information ever read, would match the one applied, because
any change of the group info after the file was opened, disables the
ioctl.

What problem does this solve?
Currently there is no way to change the group list if the process
uses suid/sgid bits to switch to a less-privileged user to restrict
the access to the original user's files. So such restriction itself
requires CAP_SETGID to work. I.e. you need cap to drop the rights.
This ioctl allows to move CAP_SETGID from the main process to the
helper (server) process, keeping the main process cap-less.
This particular implementation also requires CAP_SYS_ADMIN on helper
side, but there may be a finer-grained approaches.

Usage scenario:
Main process connects to the helper via the AF_UNIX socket. Helper
uses SO_PEERCRED to retrieve client's uid/gid and then does initgroups()
with client's uid/gid to set the desired group list. Then it sets
uid/gid to match client's. It should keep his euid/egid untouched
or keep caps, as CAP_SYS_ADMIN check is performed by this ioctl.
Then it opens /proc/self/status and sends it to client with SCM_RIGHTS.
Client then reads from fd to validate the info in any way he wants,
and uses ioctl(PROCFS_SET_GROUPS) on that fd to actually set up the
received group list. It then replies to the server with the operation
status, or just closes the connection. At that point server is allowed
to exit.

Security considerations:
As explained above, the server process, when opening the status file,
must have uid/gid matching those of client (getting them via
SO_PEERCRED), or ioctl() fails. This may mean either of the below:
1. The server is setuid/setgid-capable. In this case we trust it does
the right thing, namely, calls initgroups() with client's creds.
2. The server was capable but dropped the caps after changing creds.
Same as above.
3. The server got the appropriate creds via login process. In this
case it can transfer only the proper group list, as he doesn't
have anything else and can't switch anything.
4. The server used suid/sgid bits to adjust the uid/gid. In this
case he doesn't have the right group list to share.
5. The server was spawned by someone else capable, like
`sudo setpriv --clear-groups --ruid=$RUID --rgid=$RGID ./server`
and has the appropriate uid/gid but wrong group list (or empty
group list, in case of --clear-groups).

Cases 4 and 5 have a malicious potential and are difficult to
distinguish from 2 and 3 on kernel level.
While it may be possible to come up with the more fine-grained
strategy, in this patch I took the simple approach: check the server
for CAP_SYS_ADMIN. In that case he definitely has the right to send
the needed groups to anyone. Server explicitly expresses his will to
do so by changing uid/gid to client's before opening the file. Sending
/proc/self/status via SCM_RIGHTS probably never happened "in a wild"
before, and for sure in any pre-existing practice CAP_SYS_ADMIN-capable
server won't change his uid/gid to client's before opening that file.
So I figured these steps are explicit enough to not add a new open()
flag or another ioctl() to allow the group info sharing in a yet more
explicit way. That check is not performed if the client is setgid-capable,
as in this case it could set the similar group list by other means
anyway.

It may be useful to inherit such fds via exec(), so O_CLOEXEC is
not required in this patch.

PFA_NO_NEW_PRIVS disables this ioctl.

Signed-off-by: Stas Sergeev <stsp2@xxxxxxxxx>

CC: Eric Biederman <ebiederm@xxxxxxxxxxxx>
CC: Andy Lutomirski <luto@xxxxxxxxxx>
CC: Aleksa Sarai <cyphar@xxxxxxxxxx>
CC: Alexander Viro <viro@xxxxxxxxxxxxxxxxxx>
CC: Christian Brauner <brauner@xxxxxxxxxx>
CC: Jan Kara <jack@xxxxxxx>
CC: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
CC: Jeff Layton <jlayton@xxxxxxxxxx>
CC: John Johansen <john.johansen@xxxxxxxxxxxxx>
CC: Chengming Zhou <chengming.zhou@xxxxxxxxx>
CC: Casey Schaufler <casey@xxxxxxxxxxxxxxxx>
CC: Adrian Ratiu <adrian.ratiu@xxxxxxxxxxxxx>
CC: Felix Moessbauer <felix.moessbauer@xxxxxxxxxxx>
CC: Jens Axboe <axboe@xxxxxxxxx>
CC: Oleg Nesterov <oleg@xxxxxxxxxx>
CC: Kees Cook <kees@xxxxxxxxxx>
Cc: "Serge E. Hallyn" <serge@xxxxxxxxxx>
CC: linux-kernel@xxxxxxxxxxxxxxx
CC: linux-fsdevel@xxxxxxxxxxxxxxx
---
fs/proc/base.c | 137 +++++++++++++++++++++++++++++++++++++++-
include/linux/cred.h | 8 +++
include/uapi/linux/fs.h | 2 +
3 files changed, 145 insertions(+), 2 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index e7c8554ec95a..6c48493b2c46 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -98,6 +98,7 @@
#include <linux/resctrl.h>
#include <linux/cn_proc.h>
#include <linux/ksm.h>
+#include <linux/cred.h>
#include <uapi/linux/lsm.h>
#include <trace/events/oom.h>
#include "internal.h"
@@ -829,6 +830,138 @@ static const struct file_operations proc_single_file_operations = {
};


+static int proc_status_open(struct inode *inode, struct file *filp)
+{
+ struct proc_inode *pi = PROC_I(inode);
+ struct pid *opener = get_task_pid(current, PIDTYPE_PID);
+
+ pi->op.proc_show = proc_pid_status;
+ return single_open(filp, proc_single_show, opener);
+}
+
+static int proc_status_release(struct inode *inode, struct file *file)
+{
+ struct seq_file *seq = file->private_data;
+ struct pid *opener = seq->private;
+
+ put_pid(opener);
+ return single_release(inode, file);
+}
+
+static bool can_borrow_groups(const struct cred *cur_cred,
+ const struct cred *f_cred)
+{
+ if (may_setgroups())
+ return 1;
+ /* Make sure the process can't switch uid/gid. */
+ if (!uid_eq(cur_cred->euid, cur_cred->uid) ||
+ !uid_eq(cur_cred->suid, cur_cred->uid))
+ return 0;
+ if (!gid_eq(cur_cred->egid, cur_cred->gid) ||
+ !gid_eq(cur_cred->sgid, cur_cred->gid))
+ return 0;
+ /* Make sure the euid/egid of current processes are equal
+ * to uid/gid of an opener at file open time.
+ */
+ if (!uid_eq(f_cred->uid, cur_cred->euid) ||
+ !gid_eq(f_cred->gid, cur_cred->egid))
+ return 0;
+ return 1;
+}
+
+static int do_proc_setgroups(const struct cred *task_cred,
+ const struct cred *cur_cred,
+ const struct cred *f_cred)
+{
+ struct group_info *cgi = get_group_info(cur_cred->group_info);
+ struct group_info *gi = get_group_info(task_cred->group_info);
+ int err;
+
+ /* Make sure groups didn't change since file open. */
+ err = -EPERM;
+ if (f_cred->group_info != gi)
+ goto out_gi;
+ /* Don't error if the process is setting the same list again. */
+ err = 0;
+ if (cgi == gi)
+ goto out_gi;
+
+ err = -EPERM;
+ if (!can_borrow_groups(cur_cred, f_cred))
+ goto out_gi;
+ err = set_current_groups(gi);
+
+out_gi:
+ put_group_info(gi);
+ put_group_info(cgi);
+ return err;
+}
+
+static int do_status_ioctl(struct task_struct *task, struct file *file,
+ unsigned int cmd, unsigned long arg)
+{
+ const struct cred *task_cred;
+ const struct cred *cur_cred;
+ int err = -EINVAL;
+
+ switch (cmd) {
+ case PROCFS_SET_GROUPS:
+ if (arg)
+ break;
+ /* Disallow opener process to set his own groups. */
+ err = -EPERM;
+ if (task == current)
+ break;
+ /* Don't change anything if current has NO_NEW_PRIVS. */
+ if (task_no_new_privs(current))
+ break;
+ /* Opener must be capable of granting his groups. */
+ if (!file_ns_capable(file, &init_user_ns, CAP_SYS_ADMIN))
+ break;
+ task_cred = get_task_cred(task);
+ cur_cred = get_current_cred();
+ err = do_proc_setgroups(task_cred, cur_cred, file->f_cred);
+ put_cred(cur_cred);
+ put_cred(task_cred);
+ break;
+ }
+ return err;
+}
+
+static long proc_status_ioctl(struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ struct inode *inode = file_inode(file);
+ struct pid *pid = proc_pid(inode);
+ struct seq_file *seq = file->private_data;
+ struct pid *opener = seq->private;
+ struct task_struct *task;
+ long err;
+
+ /* Make sure opener opened his own proc entry. */
+ if (pid != opener)
+ return -EPERM;
+
+ task = get_pid_task(pid, PIDTYPE_PID);
+ if (!task)
+ return -ESRCH;
+
+ err = do_status_ioctl(task, file, cmd, arg);
+
+ put_task_struct(task);
+ return err;
+}
+
+static const struct file_operations proc_status_operations = {
+ .open = proc_status_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = proc_status_release,
+ .unlocked_ioctl = proc_status_ioctl,
+ .compat_ioctl = proc_status_ioctl,
+};
+
+
struct mm_struct *proc_mem_open(struct inode *inode, unsigned int mode)
{
struct task_struct *task = get_proc_task(inode);
@@ -3306,7 +3439,7 @@ static const struct pid_entry tgid_base_stuff[] = {
#endif
REG("environ", S_IRUSR, proc_environ_operations),
REG("auxv", S_IRUSR, proc_auxv_operations),
- ONE("status", S_IRUGO, proc_pid_status),
+ REG("status", S_IRUGO, proc_status_operations),
ONE("personality", S_IRUSR, proc_pid_personality),
ONE("limits", S_IRUGO, proc_pid_limits),
#ifdef CONFIG_SCHED_DEBUG
@@ -3657,7 +3790,7 @@ static const struct pid_entry tid_base_stuff[] = {
#endif
REG("environ", S_IRUSR, proc_environ_operations),
REG("auxv", S_IRUSR, proc_auxv_operations),
- ONE("status", S_IRUGO, proc_pid_status),
+ REG("status", S_IRUGO, proc_status_operations),
ONE("personality", S_IRUSR, proc_pid_personality),
ONE("limits", S_IRUGO, proc_pid_limits),
#ifdef CONFIG_SCHED_DEBUG
diff --git a/include/linux/cred.h b/include/linux/cred.h
index 2976f534a7a3..962a9d59847b 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -83,6 +83,14 @@ static inline int groups_search(const struct group_info *group_info, kgid_t grp)
{
return 1;
}
+static inline bool may_setgroups(void)
+{
+ return 1;
+}
+static inline int set_current_groups(struct group_info *group_info)
+{
+ return 0;
+}
#endif

/*
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 753971770733..d4b3fbfdac79 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -552,4 +552,6 @@ struct procmap_query {
__u64 build_id_addr; /* in */
};

+#define PROCFS_SET_GROUPS _IO(PROCFS_IOCTL_MAGIC, 18)
+
#endif /* _UAPI_LINUX_FS_H */
--
2.47.0