Re: [PATCH v2]oom-kill: CAP_SYS_RESOURCE should get bonus

From: David Rientjes
Date: Wed Nov 10 2010 - 15:51:07 EST


On Wed, 10 Nov 2010, Figo.zhang wrote:

> > I didn't check earlier, but CAP_SYS_RESOURCE hasn't had a place in the oom
> > killer's heuristic in over five years, so what regression are we referring
> > to in this thread? These tasks already have full control over
> > oom_score_adj to modify its oom killing priority in either direction.
>
> yes, it can control by user, but is it all system administrators will
> adjust all of the processes by each one and one in real word? suppose if
> it has thousands of processes in database system.
>

Yes, the kernel can't possibly know the oom killing priorities of your
task so if you have such requirements then you must use the userspace
tunable.

> > Futhermore, the heuristic was entirely rewritten, but I wouldn't consider
> > all the old factors such as cputime and nice level being removed as
> > "regressions" since the aim was to make it more predictable and more
> > likely to kill a large consumer of memory such that we don't have to kill
> > more tasks in the near future.
>
> the goal of oom_killer is to find out the best process to kill, the one
> should be:
> 1. it is a most memory comsuming process in all processes
> 2. and it was a proper process to kill, which will not be let system
> into unpredictable state as possible.
>

There are four types of tasks that are improper to kill and this is
relatively unchanged in the past five years of the oom killer:

- init,

- kthreads,

- tasks that are bound to a disjoint set of cpuset mems or mempolicy
nodes that are not oom, and

- those disabled from oom killing by userspace.

That does not include CAP_SYS_RESOURCE, nor CAP_SYS_ADMIN. Your argument
about killing some tasks that have CAP_SYS_RESOURCE leaving hardware in an
unpredictable state isn't even addressed by your own patch, you only give
them a 3% memory bonus so they are still eligible.

As mentioned previously, for this patch to make sense, you would need to
show that CAP_SYS_RESOURCE equates to 3% of the available memory's
capacity for a task. I don't believe that evidence has been presented.
This has nothing to do with preventing these threads from being killed (at
the risk of possibly panicking the machine) since your patch doesn't do
that.

> if a user process and a process such email cleint "evolution" with
> ditecly hareware access such as "Xorg", they have eat the equal memory,
> so which process are you want to kill?
>

Both have equal oom killing priority according to the heuristic if they
are not run by root. If you would like to protect Xorg, then you need to
use the userspace tunable to protect it just like everything else does.
This is completely unchanged from the oom killer rewrite.

If you actually have a problem that you're reporting, however, it would
probably be better to show the oom killer log from that event and let us
address it instead of introducing arbitrary heuristics into something
which aims to be as predictable as possible.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/