Re: [PACTH v2 0/3] Implement /proc/<pid>/totmaps

From: Minchan Kim
Date: Thu Aug 18 2016 - 22:26:18 EST


Hi Michal,

On Thu, Aug 18, 2016 at 08:01:04PM +0200, Michal Hocko wrote:
> On Thu 18-08-16 10:47:57, Sonny Rao wrote:
> > On Thu, Aug 18, 2016 at 12:44 AM, Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> > > On Wed 17-08-16 11:57:56, Sonny Rao wrote:
> [...]
> > >> 2) User space OOM handling -- we'd rather do a more graceful shutdown
> > >> than let the kernel's OOM killer activate and need to gather this
> > >> information and we'd like to be able to get this information to make
> > >> the decision much faster than 400ms
> > >
> > > Global OOM handling in userspace is really dubious if you ask me. I
> > > understand you want something better than SIGKILL and in fact this is
> > > already possible with memory cgroup controller (btw. memcg will give
> > > you a cheap access to rss, amount of shared, swapped out memory as
> > > well). Anyway if you are getting close to the OOM your system will most
> > > probably be really busy and chances are that also reading your new file
> > > will take much more time. I am also not quite sure how is pss useful for
> > > oom decisions.
> >
> > I mentioned it before, but based on experience RSS just isn't good
> > enough -- there's too much sharing going on in our use case to make
> > the correct decision based on RSS. If RSS were good enough, simply
> > put, this patch wouldn't exist.
>
> But that doesn't answer my question, I am afraid. So how exactly do you
> use pss for oom decisions?

My case is not for OOM decision but I agree it would be great if we can get
*fast* smap summary information.

PSS is really great tool to figure out how processes consume memory
more exactly rather than RSS. We have been used it for monitoring
of memory for per-process. Although it is not used for OOM decision,
it would be great if it is speed up because we don't want to spend
many CPU time for just monitoring.

For our usecase, we don't need AnonHugePages, ShmemPmdMapped, Shared_Hugetlb,
Private_Hugetlb, KernelPageSize, MMUPageSize because we never enable THP and
hugetlb. Additionally, Locked can be known via vma flags so we don't need it,
either. Even, we don't need address range for just monitoring when we don't
investigate in detail.

Although they are not severe overhead, why does it emit the useless
information? Even bloat day by day. :( With that, userspace tools should
spend more time to parse which is pointless.

Having said that, I'm not fan of creating new stat knob for that, either.
How about appending summary information in the end of smap?
So, monitoring users can just open the file and lseek to the (end - 1) and
read the summary only.

Thanks.