Re: [PACTH v2 0/3] Implement /proc/<pid>/totmaps

From: Michal Hocko
Date: Wed Aug 17 2016 - 09:03:37 EST


On Wed 17-08-16 11:31:25, Jann Horn wrote:
> On Wed, Aug 17, 2016 at 10:22:00AM +0200, Michal Hocko wrote:
> > On Tue 16-08-16 12:46:51, Robert Foss wrote:
> > [...]
> > > $ /usr/bin/time -v -p zsh -c "repeat 25 { awk '/^Rss/{rss+=\$2}
> > > /^Pss/{pss+=\$2} END {printf \"rss:%d pss:%d\n\", rss, pss}\'
> > > /proc/5025/smaps }"
> > > [...]
> > > Command being timed: "zsh -c repeat 25 { awk '/^Rss/{rss+=$2}
> > > /^Pss/{pss+=$2} END {printf "rss:%d pss:%d\n", rss, pss}\' /proc/5025/smaps
> > > }"
> > > User time (seconds): 0.37
> > > System time (seconds): 0.45
> > > Percent of CPU this job got: 92%
> > > Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.89
> >
> > This is really unexpected. Where is the user time spent? Anyway, rather
> > than measuring some random processes I've tried to measure something
> > resembling the worst case. So I've created a simple program to mmap as
> > much as possible:
> >
> > #include <sys/mman.h>
> > #include <sys/types.h>
> > #include <unistd.h>
> > #include <stdio.h>
> > int main()
> > {
> > while (mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_ANON|MAP_SHARED|MAP_POPULATE, -1, 0) != MAP_FAILED)
> > ;
> >
> > printf("pid:%d\n", getpid());
> > pause();
> > return 0;
> > }
>
> Ah, nice, that's a reasonable test program. :)
>
>
> > So with a reasonable user space the parsing is really not all that time
> > consuming wrt. smaps handling. That being said I am still very skeptical
> > about a dedicated proc file which accomplishes what userspace can done
> > in a trivial way.
>
> Now, since your numbers showed that all the time is spent in the kernel,
> also create this test program to just read that file over and over again:
>
> $ cat justreadloop.c
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <sched.h>
> #include <unistd.h>
> #include <err.h>
> #include <stdio.h>
>
> char buf[1000000];
>
> int main(int argc, char **argv) {
> printf("pid:%d\n", getpid());
> while (1) {
> int fd = open(argv[1], O_RDONLY);
> if (fd < 0) continue;
> if (read(fd, buf, sizeof(buf)) < 0)
> err(1, "read");
> close(fd);
> }
> }
> $ gcc -Wall -o justreadloop justreadloop.c
> $
>
> Now launch your test:
>
> $ ./mapstuff
> pid:29397
>
> point justreadloop at it:
>
> $ ./justreadloop /proc/29397/smaps
> pid:32567
>
> ... and then check the performance stats of justreadloop:
>
> # perf top -p 32567
>
> This is what I see:
>
> Samples: 232K of event 'cycles:ppp', Event count (approx.): 60448424325
> Overhead Shared Object Symbol
> 30,43% [kernel] [k] format_decode
> 9,12% [kernel] [k] number
> 7,66% [kernel] [k] vsnprintf
> 7,06% [kernel] [k] __lock_acquire
> 3,23% [kernel] [k] lock_release
> 2,85% [kernel] [k] debug_lockdep_rcu_enabled
> 2,25% [kernel] [k] skip_atoi
> 2,13% [kernel] [k] lock_acquire
> 2,05% [kernel] [k] show_smap

This is a lot! I would expect the rmap walk to consume more but it even
doesn't show up in the top consumers.

> That's at least 30.43% + 9.12% + 7.66% = 47.21% of the task's kernel
> time spent on evaluating format strings. The new interface
> wouldn't have to spend that much time on format strings because there
> isn't so much text to format.

well, this is true of course but I would much rather try to reduce the
overhead of smaps file than add a new file. The following should help
already. I've measured ~7% systime cut down. I guess there is still some
room for improvements but I have to say I'm far from being convinced about
a new proc file just because we suck at dumping information to the
userspace. If this was something like /proc/<pid>/stat which is
essentially read all the time then it would be a different question but
is the rss, pss going to be all that often? If yes why? These are the
questions which should be answered before we even start considering the
implementation.
---