Re: [PACTH v2 0/3] Implement /proc/<pid>/totmaps

From: Michal Hocko
Date: Mon Aug 22 2016 - 03:54:37 EST


On Fri 19-08-16 10:57:48, Sonny Rao wrote:
> On Fri, Aug 19, 2016 at 12:59 AM, Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> > On Thu 18-08-16 23:43:39, Sonny Rao wrote:
> >> On Thu, Aug 18, 2016 at 11:01 AM, Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> >> > On Thu 18-08-16 10:47:57, Sonny Rao wrote:
> >> >> On Thu, Aug 18, 2016 at 12:44 AM, Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> >> >> > On Wed 17-08-16 11:57:56, Sonny Rao wrote:
> >> > [...]
> >> >> >> 2) User space OOM handling -- we'd rather do a more graceful shutdown
> >> >> >> than let the kernel's OOM killer activate and need to gather this
> >> >> >> information and we'd like to be able to get this information to make
> >> >> >> the decision much faster than 400ms
> >> >> >
> >> >> > Global OOM handling in userspace is really dubious if you ask me. I
> >> >> > understand you want something better than SIGKILL and in fact this is
> >> >> > already possible with memory cgroup controller (btw. memcg will give
> >> >> > you a cheap access to rss, amount of shared, swapped out memory as
> >> >> > well). Anyway if you are getting close to the OOM your system will most
> >> >> > probably be really busy and chances are that also reading your new file
> >> >> > will take much more time. I am also not quite sure how is pss useful for
> >> >> > oom decisions.
> >> >>
> >> >> I mentioned it before, but based on experience RSS just isn't good
> >> >> enough -- there's too much sharing going on in our use case to make
> >> >> the correct decision based on RSS. If RSS were good enough, simply
> >> >> put, this patch wouldn't exist.
> >> >
> >> > But that doesn't answer my question, I am afraid. So how exactly do you
> >> > use pss for oom decisions?
> >>
> >> We use PSS to calculate the memory used by a process among all the
> >> processes in the system, in the case of Chrome this tells us how much
> >> each renderer process (which is roughly tied to a particular "tab" in
> >> Chrome) is using and how much it has swapped out, so we know what the
> >> worst offenders are -- I'm not sure what's unclear about that?
> >
> > So let me ask more specifically. How can you make any decision based on
> > the pss when you do not know _what_ is the shared resource. In other
> > words if you select a task to terminate based on the pss then you have to
> > kill others who share the same resource otherwise you do not release
> > that shared resource. Not to mention that such a shared resource might
> > be on tmpfs/shmem and it won't get released even after all processes
> > which map it are gone.
>
> Ok I see why you're confused now, sorry.
>
> In our case that we do know what is being shared in general because
> the sharing is mostly between those processes that we're looking at
> and not other random processes or tmpfs, so PSS gives us useful data
> in the context of these processes which are sharing the data
> especially for monitoring between the set of these renderer processes.

OK, I see and agree that pss might be useful when you _know_ what is
shared. But this sounds quite specific to a particular workload. How
many users are in a similar situation? In other words, if we present
a single number without the context, how much useful it will be in
general? Is it possible that presenting such a number could be even
misleading for somebody who doesn't have an idea which resources are
shared? These are all questions which should be answered before we
actually add this number (be it a new/existing proc file or a syscall).
I still believe that the number without wider context is just not all
that useful.

> We also use the private clean and private dirty and swap fields to
> make a few metrics for the processes and charge each process for it's
> private, shared, and swap data. Private clean and dirty are used for
> estimating a lower bound on how much memory would be freed.

I can imagine that this kind of information might be useful and
presented in /proc/<pid>/statm. The question is whether some of the
existing consumers would see the performance impact due to he page table
walk. Anyway even these counters might get quite tricky because even
shareable resources are considered private if the process is the only
one to map them (so again this might be a file on tmpfs...).

> Swap and
> PSS also give us some indication of additional memory which might get
> freed up.
--
Michal Hocko
SUSE Labs