Re: [patch] mm, oom: base root bonus on current usage

From: Johannes Weiner
Date: Wed Jan 29 2014 - 21:12:59 EST


On Wed, Jan 29, 2014 at 12:28:13PM -0800, Andrew Morton wrote:
> On Sat, 25 Jan 2014 19:48:32 -0800 (PST) David Rientjes <rientjes@xxxxxxxxxx> wrote:
>
> > A 3% of system memory bonus is sometimes too excessive in comparison to
> > other processes and can yield poor results when all processes on the
> > system are root and none of them use over 3% of memory.
> >
> > Replace the 3% of system memory bonus with a 3% of current memory usage
> > bonus.
>
> This changelog has deteriorated :( We should provide sufficient info so
> that people will be able to determine whether this patch will fix a
> problem they or their customers are observing. And so that people who
> maintain -stable and its derivatives can decide whether to backport it.
>
> I went back and stole some text from the v1 patch. Please review the
> result. The changelog would be even better if it were to describe the
> new behaviour under the problematic workloads.

Looks good to me, thanks. How about the below?

> We don't think -stable needs this?

That's actually a good idea, we're putting it into RHEL too.

> From: David Rientjes <rientjes@xxxxxxxxxx>
> Subject: mm, oom: base root bonus on current usage
>
> A 3% of system memory bonus is sometimes too excessive in comparison to
> other processes.
>
> With a63d83f427fb ("oom: badness heuristic rewrite"), the OOM killer tries
> to avoid killing privileged tasks by subtracting 3% of overall memory
> (system or cgroup) from their per-task consumption. But as a result, all
> root tasks that consume less than 3% of overall memory are considered
> equal, and so it only takes 33+ privileged tasks pushing the system out of
> memory for the OOM killer to do something stupid and kill sshd or
> dhclient. For example, on a 32G machine it can't tell the difference
> between the 1M agetty and the 10G fork bomb member.
>
> The changelog describes this 3% boost as the equivalent to the global
> overcommit limit being 3% higher for privileged tasks, but this is not the
> same as discounting 3% of overall memory from _every privileged task
> individually_ during OOM selection.
>
> Replace the 3% of system memory bonus with a 3% of current memory usage
> bonus.

By giving root tasks a bonus that is proportional to their actual
size, they remain comparable even when relatively small. In the
example above, the OOM killer will discount the 1M agetty's 256
badness points down to 179, and the 10G fork bomb's 262144 points down
to 183500 points and make the right choice, instead of discounting
both to 0 and killing agetty because it's first in the task list.

> Signed-off-by: David Rientjes <rientjes@xxxxxxxxxx>
> Reported-by: Johannes Weiner <hannes@xxxxxxxxxxx>
> Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx>
> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>

Cc: <stable@xxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/