Re: [PATCH] Revert oom rewrite series

From: Mandeep Singh Baines
Date: Tue Nov 16 2010 - 19:49:16 EST


Bodo Eggert (7eggert@xxxxxx) wrote:
> On Mon, 15 Nov 2010, David Rientjes wrote:
> > On Tue, 16 Nov 2010, Bodo Eggert wrote:
>
> > > > CAP_SYS_RESOURCE threads have full control over their oom killing priority
> > > > by /proc/pid/oom_score_adj
> > >
> > > , but unless they are written in the last months and designed for linux
> > > and if the author took some time to research each external process invocation,
> > > they can not be aware of this possibility.
> > >
> >
> > You're clearly wrong, CAP_SYS_RESOURCE has been required to modify oom_adj
> > for over five years (as long as the git history). 8fb4fc68, merged into
> > 2.6.20, allowed tasks to raise their own oom_adj but not decrease it.
> > That is unchanged by the rewrite.
>
> You are misunderstanding me. It was allowed to do this, but it did not need
> to do it yet. It was enough to be a well-written POSIX application without
> linux-specific OOM hacks for some specific kernel versions.
>
> > > Besides that, if each process is supposed to change the default, the default
> > > is wrong.
> >
> > That doesn't make any sense, if want to protect a thread from the oom
> > killer you're going to need to modify oom_score_adj, the kernel can't know
> > what you perceive as being vital. Having CAP_SYS_RESOURCE alone does not
> > imply that, it only allows unbounded access to resources. That's
> > completely orthogonal to the goal of the oom killer heuristic, which is to
> > find the most memory-hogging task to kill.
>
> The old oom killer's task was to guess the best victim to kill. For me, it
> did a good job (but the system kept thrashing for too long until it kicked

Here's a patch I've been working on to control thrashing.

http://lkml.org/lkml/2010/10/28/289

It works well for our app: web browser. We'd rather OOM quickly and kill
a browser tab than thrash for a few minutes and then OOM. It works well for
us but I'm working on a more generally useful solution.

> the offender). Looking at CAP_SYS_RESOURCE was one way to recognize
> important processes.
>
> > > 1) The exponential scale did have a low resolution.
> > >
> > > 2) The heuristics were developed using much brain power and much
> > > trial-and-error. You are going back to basics, and some people
> > > are not convinced that this is better. I googled and I did not
> > > find a discussion about how and why the new score was designed
> > > this way.
> > > looking at the output of:
> > > cd /proc; for a in [0-9]*; do
> > > echo `cat $a/oom_score` $a `perl -pes/'\0.*$'// < $a/cmdline`;
> > > done|grep -v ^0|sort -n |less
> > > , I 'm not convinced, too.
> > >
> >
> > The old heuristics were a mixture of arbitrary values that didn't adjust
> > scores based on a unit and would often cause the incorrect task to be
> > targeted because there was no clear goal being achieved. The new
> > heuristic has a solid goal: to identify and kill the most memory-hogging
> > task that is eligible given the context in which the oom occurs. If you
> > disagree with that goal and want any of the old heursitics reintroduced,
> > please show that it makes sense in the oom killer.
>
> The first old OOM killer did the same as you promise the current one does,
> except for your bugfixes. That's why it killed the wrong applications and
> all the heuristics were added until the complaints stopped.
>
> Off cause I did not yet test your OOM killer, maybe it really is better.
> Heuristics tend to rot and you did much work to make it right.
>
> I don't want the old OOM killer back, but I don't want you to fall
> into the same pits as the pre-old OOM killer used to do.
>
> > > PS) Mapping an exponential value to a linear score is bad. E.g. A
> > > oom_adj of 8 should make an 1-MB-process as likely to kill as
> > > a 256-MB-process with oom_adj=0.
> > >
> >
> > To show that, you would have to show that an application that exists today
> > uses an oom_adj for something other than polarization and is based on a
> > calculation of allowable memory usage. It simply doesn't exist.
>
> No such application should exist because the OOM killer should DTRT.
> oom_adj was supposed to let the sysadmin lower his mission-critical
> DB's score to be just lower than the less-important tasks, or to
> point the kernel to his ever-faulty and easily-restarted browser.
>
> > > PS2) Because I saw this in your presentation PDF: (@udev-people)
> > > The -17 score of udevd is wrong, since it will even prevent
> > > the OOM killer from working correctly if it grows to 100 MB:
> > >
> >
> > Threads with CAP_SYS_RESOURCE are free to lower the oom_score_adj of any
> > thread they deem fit and that includes applications that lower its own
> > oom_score_adj. The kernel isn't going to prohibit users from setting
> > their own oom_score_adj.
>
> My point is: The udev people should not prevent the OOM killer
> unconditionally, it has an important task in case something goes wrong.
> I just didn't want to start a new thread at that time of day.
> --
> How do I set my laser printer on stun?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/