Re: [PACTH v2 0/3] Implement /proc/<pid>/totmaps

From: Michal Hocko
Date: Fri Aug 19 2016 - 03:59:18 EST


On Thu 18-08-16 23:43:39, Sonny Rao wrote:
> On Thu, Aug 18, 2016 at 11:01 AM, Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> > On Thu 18-08-16 10:47:57, Sonny Rao wrote:
> >> On Thu, Aug 18, 2016 at 12:44 AM, Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> >> > On Wed 17-08-16 11:57:56, Sonny Rao wrote:
> > [...]
> >> >> 2) User space OOM handling -- we'd rather do a more graceful shutdown
> >> >> than let the kernel's OOM killer activate and need to gather this
> >> >> information and we'd like to be able to get this information to make
> >> >> the decision much faster than 400ms
> >> >
> >> > Global OOM handling in userspace is really dubious if you ask me. I
> >> > understand you want something better than SIGKILL and in fact this is
> >> > already possible with memory cgroup controller (btw. memcg will give
> >> > you a cheap access to rss, amount of shared, swapped out memory as
> >> > well). Anyway if you are getting close to the OOM your system will most
> >> > probably be really busy and chances are that also reading your new file
> >> > will take much more time. I am also not quite sure how is pss useful for
> >> > oom decisions.
> >>
> >> I mentioned it before, but based on experience RSS just isn't good
> >> enough -- there's too much sharing going on in our use case to make
> >> the correct decision based on RSS. If RSS were good enough, simply
> >> put, this patch wouldn't exist.
> >
> > But that doesn't answer my question, I am afraid. So how exactly do you
> > use pss for oom decisions?
>
> We use PSS to calculate the memory used by a process among all the
> processes in the system, in the case of Chrome this tells us how much
> each renderer process (which is roughly tied to a particular "tab" in
> Chrome) is using and how much it has swapped out, so we know what the
> worst offenders are -- I'm not sure what's unclear about that?

So let me ask more specifically. How can you make any decision based on
the pss when you do not know _what_ is the shared resource. In other
words if you select a task to terminate based on the pss then you have to
kill others who share the same resource otherwise you do not release
that shared resource. Not to mention that such a shared resource might
be on tmpfs/shmem and it won't get released even after all processes
which map it are gone.

I am sorry for being dense but it is still not clear to me how the
single pss number can be used for oom or, in general, any serious
decisions. The counter might be useful of course for debugging purposes
or to have a general overview but then arguing about 40 vs 20ms sounds a
bit strange to me.

> Chrome tends to use a lot of shared memory so we found PSS to be
> better than RSS, and I can give you examples of the RSS and PSS on
> real systems to illustrate the magnitude of the difference between
> those two numbers if that would be useful.
>
> >
> >> So even with memcg I think we'd have the same problem?
> >
> > memcg will give you instant anon, shared counters for all processes in
> > the memcg.
> >
>
> We want to be able to get per-process granularity quickly. I'm not
> sure if memcg provides that exactly?

I will give you that information if you do process-per-memcg but that
doesn't sound ideal. I thought those 20-something processes you were
talking about are treated together but it seems I misunderstood.
--
Michal Hocko
SUSE Labs