[PATCH 0/3]Sysctl: clean the code and prepare for secure use in containers

From: Pavel Emelyanov
Date: Wed Feb 27 2008 - 08:47:57 EST


Many (most of) sysctls do not have a per-container sense. E.g.
kernel.print_fatal_signals, vm.panic_on_oom, net.core.netdev_budget
and so on and so forth. Besides, tuning then from inside a container
is not even secure. On the other hand, hiding them completely from
the container's tasks sometimes causes user-space to stop working.

When developing net sysctl, the common practice was to duplicate
a table and drop the write bits in table->mode, but this approach
was not very elegant, lead to excessive memory consumption and
was not suitable in general.

Here's the alternative solution. To facilitate the per-container
sysctls ctl_table_root-s were introduced. Each root contains a
list of ctl_table_header-s that are visible to different namespaces.
The idea of this set is to add the permissions() callback on the
ctl_table_root to allow ctl root limit permissions to the same
ctl_table-s.

The main user of this functionality is the net-namespaces code,
but later this will (should) be used by more and more namespaces,
containers and control groups.

Actually, this idea's core is in a single hunk in the third patch.
First two patches are cleanups for sysctl code, while the third
one mostly extends the arguments set of some sysctl functions.

Signed-off-by: Pavel Emelyanov <xemul@xxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/