Re: [PATCH] Add pidfs filesystem
From: Pavel Emelyanov
Date: Wed Feb 22 2017 - 02:41:30 EST
On 02/21/2017 05:57 PM, Oleg Nesterov wrote:
> On 02/18, Alexey Gladkov wrote:
>>
>> This patch allows to mount only the part of /proc related to pids
>> without rest objects. Since this is an addon to /proc, flags applied to
>> /proc have an effect on this pidfs filesystem.
>
> I leave this to you and Eric, but imo it would be nice to avoid another
> filesystem.
>
>> Why not implement it as another flag to /proc ?
>>
>> The /proc flags is stored in the pid_namespace and are global for
>> namespace. It means that if you add a flag to hide all except the pids,
>> then it will act on all mounted instances of /proc.
>
> But perhaps we can use mnt_flags? For example, lets abuse MNT_NODEV, see
> the simple patch below. Not sure it is correct/complete, just to illustrate
> the idea.
>
> With this patch you can mount proc with -onodev and it will only show
> pids/self/thread_self:
>
> # mkdir /tmp/D
> # mount -t proc -o nodev none /tmp/D
> # ls /tmp/D
> 1 11 13 15 17 19 20 22 24 28 3 31 33 4 56 7 9 thread-self
> 10 12 14 16 18 2 21 23 27 29 30 32 34 5 6 8 self
> # cat /tmp/D/meminfo
> cat: /tmp/D/meminfo: No such file or directory
> # ls /tmp/D/irq
> ls: cannot open directory /tmp/D/irq: No such file or directory
>
> No?
Yes!!! If this whole effort with pidfs and overlayfs will move forward, I would
prefer seeing the nodev procfs version, rather than another fs.
As far as the overlayfs part is concerned, having an overlayfs mounted on /proc
inside container may result in problems as applications sometimes check for /proc
containing procfs (by checking statfs.f_type == PROC_SUPER_MAGIC or by reading
the /proc/mounts).
-- Pavel
> Oleg.
>
>
> --- a/fs/proc/generic.c
> +++ b/fs/proc/generic.c
> @@ -305,11 +305,22 @@ int proc_readdir_de(struct proc_dir_entry *de, struct file *file,
>
> int proc_readdir(struct file *file, struct dir_context *ctx)
> {
> + int mnt_flags = file->f_path.mnt->mnt_flags;
> struct inode *inode = file_inode(file);
>
> + if (mnt_flags & MNT_NODEV)
> + return 1;
> +
> return proc_readdir_de(PDE(inode), file, ctx);
> }
>
> +static int proc_dir_open(struct inode *inode, struct file *file)
> +{
> + if (file->f_path.mnt->mnt_flags & MNT_NODEV)
> + return -ENOENT;
> + return 0;
> +}
> +
> /*
> * These are the generic /proc directory operations. They
> * use the in-memory "struct proc_dir_entry" tree to parse
> @@ -319,6 +330,7 @@ static const struct file_operations proc_dir_operations = {
> .llseek = generic_file_llseek,
> .read = generic_read_dir,
> .iterate_shared = proc_readdir,
> + .open = proc_dir_open,
> };
>
> /*
> --- a/fs/proc/inode.c
> +++ b/fs/proc/inode.c
> @@ -318,12 +318,16 @@ proc_reg_get_unmapped_area(struct file *file, unsigned long orig_addr,
>
> static int proc_reg_open(struct inode *inode, struct file *file)
> {
> + int mnt_flags = file->f_path.mnt->mnt_flags;
> struct proc_dir_entry *pde = PDE(inode);
> int rv = 0;
> int (*open)(struct inode *, struct file *);
> int (*release)(struct inode *, struct file *);
> struct pde_opener *pdeo;
>
> + if (mnt_flags & MNT_NODEV)
> + return -ENOENT;
> +
> /*
> * Ensure that
> * 1) PDE's ->release hook will be called no matter what
>
> .
>