Re: [PATCH] Allow restricting permissions in /proc/sys

From: Topi Miettinen
Date: Mon Nov 04 2019 - 12:59:03 EST


On 4.11.2019 17.44, Eric W. Biederman wrote:
Topi Miettinen <toiwoton@xxxxxxxxx> writes:

On 3.11.2019 20.50, Eric W. Biederman wrote:
Topi Miettinen <toiwoton@xxxxxxxxx> writes:

Several items in /proc/sys need not be accessible to unprivileged
tasks. Let the system administrator change the permissions, but only
to more restrictive modes than what the sysctl tables allow.

This looks quite buggy. You neither update table->mode nor
do you ever read from table->mode to initialize the inode.
I am missing something in my quick reading of your patch?

inode->i_mode gets initialized in proc_sys_make_inode().

I didn't want to touch the table, so that the original permissions can
be used to restrict the changes made. In case the restrictions are
removed as suggested by Theodore Ts'o, table->mode could be
changed. Otherwise I'd rather add a new field to store the current
mode and the mode field can remain for reference. As the original
author of the code from 2007, would you let the administrator to
chmod/chown the items in /proc/sys without restrictions (e.g. 0400 ->
0777)?

At an architectural level I think we need to do this carefully and have
a compelling reason. The code has survived nearly the entire life of
linux without this capability.

I'd be happy with only allowing restrictions to access for now. Perhaps later with more analysis, also relaxing changes and maybe UID/GID changes can be allowed.

I think right now the common solution is to mount another file over the
file you are trying to hide/limit. Changing the permissions might be
better but that is not at all clear.

Do you have specific examples of the cases where you would like to
change the permissions?

Unprivileged applications typically do not need to access most items in /proc/sys, so I'd like to gradually find out which are needed. So far I've seen no problems with 0500 mode for directories abi, crypto, debug, dev, fs, user or vm.

I'm also using systemd's InaccessiblePaths to limit access (which mounts an inaccessible directory over the path), but that's a bit too big hammer. For example there are over 100 files in /proc/sys/kernel, perhaps there will be issues when creating a mount for each, and that multiplied by a number of services.

The not updating table->mode almost certainly means that as soon as the
cached inode is invalidated the mode changes will disappear. Not to
mention they will fail to propogate between different instances of
proc.

Loosing all of your changes at cache invalidation seems to make this a
useless feature.

At least different proc instances seem to work just fine here (they
show the same changes), but I suppose you are right about cache
invalidation.

It is going to take the creation of a pid namespace to see different
proc instances. All mounts of the proc within the same pid_namespace
return the same instance.

I see no problems by using Firejail (which uses PID namespacing) with v2, the permissions in /proc/sys are the same as outside the namespace.

-Topi