[PATCH 0/3 staging-next] android: Lowmemmorykiller task tree

From: peter enderborg
Date: Thu Feb 09 2017 - 08:33:35 EST


Lowmemorykiller efficiency problem and a solution.

Lowmemorykiller in android has a severe efficiency problem. The basic
problem is that the registered shrinker gets called very often without
anything actually happening. This is in some cases not a problem as
it is a simple calculation that returns a value. But when there is
high pressure on memory and we get to start killing processes to free
memory we get some heavy work that does lots of cpu processing and
lock holding for no real benefit. This occurs when we are below the
first threshold level in minfree. We call that waste. To see this
problem we introduce a patch that collects statistics from
lowmemorykiller. We collect the amount of kills, scans, counts and
some other metrics. One of this metrics is called waste. These metrics
are presented in procfs as /proc/lmkstats.

Patchset:
0001-android-Collect-statistics-from-lowmemorykiller.patch
0002-oom-Add-notification-for-oom_score_adj.patch
0003-mm-Remove-RCU-and-tasklocks-from-lmk.patch


Collect-statistics-from-lowmemorykiller.patch
---------------------------------------------
This patch only adds metrics and is there to show
behavour before and after and is a good way to
see that the device is in waste zone.


0002-oom-Add-notification-for-oom_score_adj.patch
------------------------------------------------
This is the prerequisite patch to be able to do
the lowmemorykiller change. It introduces notifiers
for oom_score_adj. It generates notifier events for
process creation and death, and when process values
are changed. These patches are outside from stageing
drivers and are applied to core functions in e.g. fork.c.

0003-mm-Remove-RCU-and-tasklocks-from-lmk.patch
-----------------------------------------------
This patch is the change of lowmemorykiller. It
builds a tree structure that works as cache for
the task list, but only contains the tasks that
are relevant for the lmk. The key thing here is
that the cache is sorted based on the oom_score_adj
value so the scan and count function can find
the right task with only a tree first operation.
Based on the right task the count can give a
proper reply and give a right estimate of the
amount it will free, and more important when
it is not willing to free anything. This makes
the shrinker not to call the scan function at all,
and when it is called it actually do what it's
supposed to do that is to free up some memory.
I consider this as mm based on the behaviour
changes for the shrinker even if the code is
a driver.

About testing.
Reproduce the problem. For this the first patch is needed and enabeld.
It does not change the lowmemory killer other than it add some metrics.
One counter is called WASTE. This is what this patch-set is about.
In android environment this can be tested directly. On other systems
like fedora a method using the stress package can be used. Apply the
patches. (First with only metrics) then in your

shell: echo 400 > /proc/self/oom_score_adj

Now you have created a shell that has something that can be killed.
In the same shell use stress program. The parameters will be very
dependent on your configuration, but you need to run out of memmory.

Most of the wasted cpu cycles are accounted in kswapd0 task so a compare
of the reduced waste can also be seen in the schedstat for that task.
However activitymanager will get some more work done in kernel space.
Finaly the new version also has the WASTE counter, but this one is
the cost of only a rbtree search.

Cost/Drawback
The impact on the fork call is on a 2ghz arm64 is about 500ns for the
notifier.