Re: [PATCH] mm: add MM_SWAPENTS and page table when calculate tasksize in lowmem_scan()

From: Michal Hocko
Date: Wed Feb 17 2016 - 13:10:10 EST


On Tue 16-02-16 16:35:39, David Rientjes wrote:
> On Tue, 16 Feb 2016, Greg Kroah-Hartman wrote:
>
> > On Tue, Feb 16, 2016 at 05:37:05PM +0800, Xishi Qiu wrote:
> > > Currently tasksize in lowmem_scan() only calculate rss, and not include swap.
> > > But usually smart phones enable zram, so swap space actually use ram.
> >
> > Yes, but does that matter for this type of calculation? I need an ack
> > from the android team before I could ever take such a core change to
> > this code...
> >
>
> The calculation proposed in this patch is the same as the generic oom
> killer, it's an estimate of the amount of memory that will be freed if it
> is killed and can exit. This is better than simply get_mm_rss().
>
> However, I think we seriously need to re-consider the implementation of
> the lowmem killer entirely. It currently abuses the use of TIF_MEMDIE,
> which should ideally only be set for one thread on the system since it
> allows unbounded access to global memory reserves.
>
> It also abuses the user-visible /proc/self/oom_score_adj tunable: this
> tunable is used by the generic oom killer to bias or discount a proportion
> of memory from a process's usage. This is the only supported semantic of
> the tunable. The lowmem killer uses it as a strict prioritization, so any
> process with oom_score_adj higher than another process is preferred for
> kill, REGARDLESS of memory usage. This leads to priority inversion, the
> user is unable to always define the same process to be killed by the
> generic oom killer and the lowmem killer. This is what happens when a
> tunable with a very clear and defined purpose is used for other reasons.
>
> I'd seriously consider not accepting any additional hacks on top of this
> code until the implementation is rewritten.

Fully agreed!

--
Michal Hocko
SUSE Labs