Re: [PATCH] staging, android: remove lowmemory killer from the tree

From: Michal Hocko
Date: Fri Feb 24 2017 - 10:04:11 EST


On Fri 24-02-17 15:42:49, peter enderborg wrote:
> On 02/24/2017 03:11 PM, Michal Hocko wrote:
> > On Fri 24-02-17 14:16:34, peter enderborg wrote:
> >> On 02/24/2017 01:28 PM, Michal Hocko wrote:
> > [...]
> >>> Yeah, I strongly believe that the chosen approach is completely wrong.
> >>> Both in abusing the shrinker interface and abusing oom_score_adj as the
> >>> only criterion for the oom victim selection.
> >> No one is arguing that shrinker is not problematic. And would be great
> >> if it is removed from lmk. The oom_score_adj is the way user-space
> >> tells the kernel what the user-space has as prio. And android is using
> >> that very much. It's a core part.
> > Is there any documentation which describes how this is done?
> >
> >> I have never seen it be used on
> >> other linux system so what is the intended usage of oom_score_adj? Is
> >> this really abusing?
> > oom_score_adj is used to _adjust_ the calculated oom score. It is not a
> > criterion on its own, well, except for the extreme sides of the range
> > which are defined to enforce resp. disallow selecting the task. The
> > global oom killer calculates the oom score as a function of the memory
> > consumption. Your patch simply ignores the memory consumption (and uses
> > pids to sort tasks with the same oom score which is just mind boggling)
>
> How much it uses is of very little importance for android.

But it is relevant for the global oom killer which is the main consumer of
the oom_score_adj.

> The score
> used are only for apps and their services. System related are not
> touched by android lmk. The pid is only to have a unique key to be
> able to have it fast within a rbtree. One idea was to use task_pid to
> get a strict age of process to get a round robin but since it does not
> matter i skipped that idea since it does not matter.

Pid will not tell you anything about the age. Pids do wrap around.

> > and that is what I call the abuse. The oom score calculation might
> > change in future, of course, but all consumers of the oom_score_adj
> > really have to agree on the base which is adjusted by this tunable
> > otherwise you can see a lot of unexpected behavior.
>
> Then can we just define a range that is strictly for user-space?

This is already well defined. The whole range OOM_SCORE_ADJ_{MIN,MAX}
is usable.

> > I would even argue that nobody outside of mm/oom_kill.c should really
> > have any business with this tunable. You can of course tweak the value
> > from the userspace and help to chose a better oom victim this way but
> > that is it.
>
> Why only help? If userspace can give an exact order to kernel that
> must be a good thing; other wise kernel have to guess and when
> can that be better?

Because userspace doesn't know who is the best victim in 99% cases.
Android might be different, although, I am a bit skeptical - especially
after hearing quite some complains about random application being
killed... If you do believe that you know better then, by all means,
implement your custom user space LMK and chose the oom victim on a
different basis but try to understand that the global OOM killer is the
last resort measure to make the system usable again. There is a good
reason why the kernel uses the current badness calculation. The previous
implementation which considered the process age ad other things was just
too random to have a understandable behavior.

In any case playing nasty games with the oom killer tunables might and
will lead, well, to unexpected behavior.
--
Michal Hocko
SUSE Labs