Re: [PACTH v2 0/3] Implement /proc/<pid>/totmaps
From: Michal Hocko
Date: Thu Aug 18 2016 - 03:45:11 EST
On Wed 17-08-16 11:57:56, Sonny Rao wrote:
> On Wed, Aug 17, 2016 at 6:03 AM, Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> > On Wed 17-08-16 11:31:25, Jann Horn wrote:
[...]
> >> That's at least 30.43% + 9.12% + 7.66% = 47.21% of the task's kernel
> >> time spent on evaluating format strings. The new interface
> >> wouldn't have to spend that much time on format strings because there
> >> isn't so much text to format.
> >
> > well, this is true of course but I would much rather try to reduce the
> > overhead of smaps file than add a new file. The following should help
> > already. I've measured ~7% systime cut down. I guess there is still some
> > room for improvements but I have to say I'm far from being convinced about
> > a new proc file just because we suck at dumping information to the
> > userspace.
> > If this was something like /proc/<pid>/stat which is
> > essentially read all the time then it would be a different question but
> > is the rss, pss going to be all that often? If yes why?
>
> If the question is why do we need to read RSS, PSS, Private_*, Swap
> and the other fields so often?
>
> I have two use cases so far involving monitoring per-process memory
> usage, and we usually need to read stats for about 25 processes.
>
> Here's a timing example on an fairly recent ARM system 4 core RK3288
> running at 1.8Ghz
>
> localhost ~ # time cat /proc/25946/smaps > /dev/null
>
> real 0m0.036s
> user 0m0.020s
> sys 0m0.020s
>
> localhost ~ # time cat /proc/25946/totmaps > /dev/null
>
> real 0m0.027s
> user 0m0.010s
> sys 0m0.010s
> localhost ~ #
>
> I'll ignore the user time for now, and we see about 20 ms of system
> time with smaps and 10 ms with totmaps, with 20 similar processes it
> would be 400 milliseconds of cpu time for the kernel to get this
> information from smaps vs 200 milliseconds with totmaps. Even totmaps
> is still pretty slow, but much better than smaps.
>
> Use cases:
> 1) Basic task monitoring -- like "top" that shows memory consumption
> including PSS, Private, Swap
> 1 second update means about 40% of one CPU is spent in the kernel
> gathering the data with smaps
I would argue that even 20% is way too much for such a monitoring. What
is the value to do it so often tha 20 vs 40ms really matters?
> 2) User space OOM handling -- we'd rather do a more graceful shutdown
> than let the kernel's OOM killer activate and need to gather this
> information and we'd like to be able to get this information to make
> the decision much faster than 400ms
Global OOM handling in userspace is really dubious if you ask me. I
understand you want something better than SIGKILL and in fact this is
already possible with memory cgroup controller (btw. memcg will give
you a cheap access to rss, amount of shared, swapped out memory as
well). Anyway if you are getting close to the OOM your system will most
probably be really busy and chances are that also reading your new file
will take much more time. I am also not quite sure how is pss useful for
oom decisions.
Don't take me wrong, /proc/<pid>/totmaps might be suitable for your
specific usecase but so far I haven't heard any sound argument for it to
be generally usable. It is true that smaps is unnecessarily costly but
at least I can see some room for improvements. A simple patch I've
posted cut the formatting overhead by 7%. Maybe we can do more.
--
Michal Hocko
SUSE Labs