Re: [PATCH v2 2/2] pidmap(2)

From: Djalal Harouni
Date: Mon Sep 25 2017 - 06:47:43 EST


Hi Alexey,

On Sun, Sep 24, 2017 at 9:08 PM, Alexey Dobriyan <adobriyan@xxxxxxxxx> wrote:
> From: Tatsiana Brouka <Tatsiana_Brouka@xxxxxxxx>
>
> Implement system call for bulk retrieveing of pids in binary form.
>
> Using /proc is slower than necessary: 3 syscalls + another 3 for each thread +
> converting with atoi() + instantiating dentries and inodes.
>
> /proc may be not mounted especially in containers. Natural extension of
> hidepid=2 efforts is to not mount /proc at all.

Actually I am not sure if software will work if /proc is not mounted,
last time (years) I
checked glibc was doing extra checks during initialization using
/proc/self/* memory
inodes and it may fail. Also fexecve() glibc is implemented using
/proc/self/... so it
depends on which library and the use case for cloud containers...

Also for the natural extension of hidepid=2 where we only want pids inside /proc
without kernel data, we have already a clean patch on top of the procfs
modernization [1] , this is the result of the previous months.


>
> It could be used by programs like ps, top or CRIU. Speed increase will
> become more drastic once combined with bulk retrieval of process statistics.

Yes the numbers are nice, seems that you want to move from filesystem syscalls
on procfs, to only use direct syscalls, hmm this does not help to fix
procfs. Tools
like ps, top and others can be updated, but anyone can *continue* to use
open+read on procfs and access the data.

I think this will be a bit hard to fix from our side, since with your
patches you are
doing it from current context, where from procfs it will be from:
current+procfs mount context.

What if procfs is mounted with "ptracepids=true" the new "hidepid=" but whithout
"gid=" interaction, and then you read from /proc/<pid>/pidmap/* as suggested
by Andy ? /proc/<pid>/pidmap/{tasks|proc|children} I am not sure about the
PIDMAP_IGNORE_KTHREADS case...


> Benchmark:
>
> N=1<<16 times
> ~130 processes (~250 task_structs) on a regular desktop system
> opendir + readdir + closedir /proc + the same for every /proc/$PID/task
> (roughly what htop(1) does) vs pidmap
>
> /proc 16.80 Â 0.73%
> pidmap 0.06 Â 0.31%

Thanks!


[1] https://github.com/legionus/linux/commit/993a2a5b9af95b0ac901ff41d32124b72ed676e3

P.S. for the procfs modernization we are planning patches next days.

--
tixxdz