Re: [PATCH] Add pidfs filesystem

From: Pavel Emelyanov
Date: Wed Feb 22 2017 - 08:08:58 EST


On 02/22/2017 03:04 PM, Alexey Gladkov wrote:
> On Wed, Feb 22, 2017 at 10:40:49AM +0300, Pavel Emelyanov wrote:
>> On 02/21/2017 05:57 PM, Oleg Nesterov wrote:
>>> On 02/18, Alexey Gladkov wrote:
>>>>
>>>> This patch allows to mount only the part of /proc related to pids
>>>> without rest objects. Since this is an addon to /proc, flags applied to
>>>> /proc have an effect on this pidfs filesystem.
>>>
>>> I leave this to you and Eric, but imo it would be nice to avoid another
>>> filesystem.
>>>
>>>> Why not implement it as another flag to /proc ?
>>>>
>>>> The /proc flags is stored in the pid_namespace and are global for
>>>> namespace. It means that if you add a flag to hide all except the pids,
>>>> then it will act on all mounted instances of /proc.
>>>
>>> But perhaps we can use mnt_flags? For example, lets abuse MNT_NODEV, see
>>> the simple patch below. Not sure it is correct/complete, just to illustrate
>>> the idea.
>>>
>>> With this patch you can mount proc with -onodev and it will only show
>>> pids/self/thread_self:
>>>
>>> # mkdir /tmp/D
>>> # mount -t proc -o nodev none /tmp/D
>>> # ls /tmp/D
>>> 1 11 13 15 17 19 20 22 24 28 3 31 33 4 56 7 9 thread-self
>>> 10 12 14 16 18 2 21 23 27 29 30 32 34 5 6 8 self
>>> # cat /tmp/D/meminfo
>>> cat: /tmp/D/meminfo: No such file or directory
>>> # ls /tmp/D/irq
>>> ls: cannot open directory /tmp/D/irq: No such file or directory
>>>
>>> No?
>>
>> Yes!!! If this whole effort with pidfs and overlayfs will move forward, I would
>> prefer seeing the nodev procfs version, rather than another fs.
>
> But this is not procfs anymore. If someone will wait for procfs here it will
> be disappointed :)

Well, it depends on what files he's looking for in there. This is what overlay
part should come for.

>> As far as the overlayfs part is concerned, having an overlayfs mounted on /proc
>> inside container may result in problems as applications sometimes check for /proc
>> containing procfs (by checking statfs.f_type == PROC_SUPER_MAGIC or by reading
>> the /proc/mounts).
>
> It is not a replacement for procfs. It's a subset of procfs. If someone wants
> the procfs in the code we should not deceive him.
>
> No?

But this is what we actually do -- Docker does with bind-mounts, LXC does with lxcfs,
OpenVZ does with kernel patches. Every time a container starts the regular /proc is
mutated not to show some information.

-- Pavel