Re: [RFC] Add option to mount only a pids subset

From: Djalal Harouni
Date: Thu Mar 23 2017 - 12:06:40 EST


Hi Alexey,

On Mon, Mar 20, 2017 at 1:58 PM, Alexey Gladkov
<gladkov.alexey@xxxxxxxxx> wrote:
>
>
> Al Viro, this patch looks better ?
>
> == Overview ==
>
> Some of the container virtualization systems are mounted /proc inside
> the container. This is done in most cases to operate with information
> about the processes. Knowing that /proc filesystem is not fully
> virtualized they are mounted on top of dangerous places empty files or
> directories (for exmaple /proc/sys, /proc/kcore, /sys/firmware, etc.).
>
> The structure of this filesystem is dynamic and any module can create a
> new object which will not necessarily be virtualized. There are
> proprietary modules that aren't in the mainline whose work we can not
> verify.
>
> This opens up a potential threat to the system. The developers of the
> virtualization system can't predict all dangerous places in /proc by
> definition.
>
> A more effective solution would be to mount into the container only what
> is necessary and ignore the rest.
>
> Right now there is the opportunity to pass in the container any port of
> the /proc filesystem using mount --bind expect the pids.
>
> This patch allows to mount only the part of /proc related to pids without
> rest objects. Since this is an option for /proc, flags applied to /proc
> have an effect on this subset of filesystem.

I just sent a patch that also has to deal with proc hidepid here:
https://lkml.org/lkml/2017/3/23/505

I'm not sure if that's the right approach, it is still buggy, however
seems that your patch also stores the mount option inside the
pid_namespace which may get propagated to all mounts inside same pidns
?

I didn't have enough time but maybe if they are related we can work it
out together ?

Thank you!


--
tixxdz