Re: [PATCH v1] sysctl: Allow change system v ipc sysctls inside ipc namespace

From: Eric W. Biederman
Date: Mon Jul 25 2022 - 12:16:47 EST


Alexey Gladkov <legion@xxxxxxxxxx> writes:

> Rootless containers are not allowed to modify kernel IPC parameters such
> as kernel.msgmnb.
>
> It seems to me that we can allow customization of these parameters if
> the user has CAP_SYS_RESOURCE in that ipc namespace.
>
> CAP_SYS_RESOURCE is already needed in order to overcome mqueue limits
> (msg_max and msgsize_max).


For changing the permissions on who can modify the SysV limits, I don't
think this change is safe. I don't see anything that will prevent abuse
if anyone can modify these limits. Replacing the ordinary unix DAC
permission check with ns_capable will allow anyone to modify the limits.


That said there is RLIMIT_MSGQUEUE that limits the posix messages queues
so those should be safe to allow anyone to modify their limits.

The code in mqueue_get_inode is where that limiting happens.

For the posix message queues all that should be needed is to change the
owner of the sysctl files from the global root to the user namespace
root. There are also two capable calls in ipc/mqueue.c that can
probably be changed to ns_capable calls.


The only posix message queue limit that I don't immediately see
something that will prevent abuse of is /proc/sys/fs/mqueue/queus_max.
That probably still runs into RLIMIT_MSGQUEUE somewhere but it was
not immediately obvious at first glance.

Eric






>
> Signed-off-by: Alexey Gladkov <legion@xxxxxxxxxx>
> ---
> ipc/ipc_sysctl.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
> index ef313ecfb53a..e79452867720 100644
> --- a/ipc/ipc_sysctl.c
> +++ b/ipc/ipc_sysctl.c
> @@ -193,16 +193,19 @@ static int set_is_seen(struct ctl_table_set *set)
> static int ipc_permissions(struct ctl_table_header *head, struct ctl_table *table)
> {
> int mode = table->mode;
> -
> -#ifdef CONFIG_CHECKPOINT_RESTORE
> struct ipc_namespace *ns = current->nsproxy->ipc_ns;
>
> +#ifdef CONFIG_CHECKPOINT_RESTORE
> if (((table->data == &ns->ids[IPC_SEM_IDS].next_id) ||
> (table->data == &ns->ids[IPC_MSG_IDS].next_id) ||
> (table->data == &ns->ids[IPC_SHM_IDS].next_id)) &&
> checkpoint_restore_ns_capable(ns->user_ns))
> mode = 0666;
> + else
> #endif
> + if (ns_capable(ns->user_ns, CAP_SYS_RESOURCE))
> + mode = 0666;
> +
> return mode;
> }