Re: [PATCH v2 2/2] pidmap(2)
From: Andy Lutomirski
Date: Wed Sep 27 2017 - 11:04:54 EST
On Tue, Sep 26, 2017 at 11:46 AM, Alexey Dobriyan <adobriyan@xxxxxxxxx> wrote:
> On Sun, Sep 24, 2017 at 02:27:00PM -0700, Andy Lutomirski wrote:
>> On Sun, Sep 24, 2017 at 1:08 PM, Alexey Dobriyan <adobriyan@xxxxxxxxx> wrote:
>> > From: Tatsiana Brouka <Tatsiana_Brouka@xxxxxxxx>
>> >
>> > Implement system call for bulk retrieveing of pids in binary form.
>> >
>> > Using /proc is slower than necessary: 3 syscalls + another 3 for each thread +
>> > converting with atoi() + instantiating dentries and inodes.
>> >
>> > /proc may be not mounted especially in containers. Natural extension of
>> > hidepid=2 efforts is to not mount /proc at all.
>> >
>> > It could be used by programs like ps, top or CRIU. Speed increase will
>> > become more drastic once combined with bulk retrieval of process statistics.
>> >
>> > Benchmark:
>> >
>> > N=1<<16 times
>> > ~130 processes (~250 task_structs) on a regular desktop system
>> > opendir + readdir + closedir /proc + the same for every /proc/$PID/task
>> > (roughly what htop(1) does) vs pidmap
>> >
>> > /proc 16.80 Ä 0.73%
>> > pidmap 0.06 Ä 0.31%
>> >
>> > PIDMAP_* flags are modelled after /proc/task_diag patchset.
>> >
>> >
>> > PIDMAP(2) Linux Programmer's Manual PIDMAP(2)
>> >
>> > NAME
>> > pidmap - get allocated PIDs
>> >
>> > SYNOPSIS
>> > long pidmap(pid_t pid, int *pids, unsigned int count , unsigned int start, int flags);
>>
>> I think we will seriously regret a syscall that does this. Djalal is
>> working on fixing the turd that is hidepid, and this syscall is
>> basically incompatible with ever fixing hidepids. I think that, to
>> make it less regrettable, it needs to take an fd to a proc mount as a
>> parameter. This makes me wonder why it's a syscall at all -- why not
>> just create a new file like /proc/pids?
>
> See reply to fdmap(2).
>
> pidmap(2) is indeed more complex case exactly because of
> pid/tgid/tid/everything else + pidnamespaces + ->hide_pid.
> However the problem remains: query task tree without all the bullshit.
> C/R people succumbed with /proc/*/children, it was a mistake IMO.
Your syscall cannot be implemented sanely. It doesn't remove bullshit
-- it adds bullshit. NAK.