Re: [patch -mm v2] mm: introduce oom_adj_child

From: David Rientjes
Date: Mon Aug 03 2009 - 03:59:45 EST


On Mon, 3 Aug 2009, KAMEZAWA Hiroyuki wrote:

> > - /proc/pid/oom_score is inconsistent when tuning /proc/pid/oom_adj if it
> > relies on the per-thread oom_adj; it now really represents nothing but
> > an incorrect value if other threads share that memory and misleads the
> > user on how the oom killer chooses victims, or
>
> What's why I said to show effective_oom_adj if necessary..
>

Right, but which of the following two behaviors do you believe the
majority of today's user applications are written to use?

(1) /proc/pid/oom_score represents the badness heuristic that the oom
killer uses to determine which task to kill, or

(2) /proc/pid/oom_adj can be adjusted after vfork() and prior to exec()
to represent the oom preference of the child without simultaneously
changing the oom preference of the parent.

The two are at a complete contrast and cannot co-exist. I favor behavior
(1), which is why my patches make it consistent in _all_ cases, since it
is more likely than not that the majority of user applications use that
behavior if, for no other reason, than it is the DOCUMENTED reason.

If you feel that's an unreasonable conclusion, then please say that so
your argument can be judged based on your interpretation of that behavior
which I believe most others would disagree with. Otherwise, our
discussion will continue to go in circles.

> > - /proc/pid/oom_score is inconsistent when the thread that set the
> > effective per-mm oom_adj exits and it is now obsolete since you have
> > no way to determine what the next effective oom_adj value shall be.
> >
> plz re-caluculate it. it's not a big job if done in lazy way.
>

You can't recalculate it if all the remaining threads have a different
oom_adj value than the effective oom_adj value from the thread that is now
exited. There is no assumption that, for instance, the most negative
oom_adj value shall then be used. Imagine the effective oom_adj value
being +15 and a thread sharing the same memory has an oom_adj value of
-16. Under no reasonable circumstance should the oom preference of the
entire thread then change to -16 just because its the side-effect of a
thread exiting.

That's the _entire_ reason why we need consistency in oom_adj values so
that userspace is aware of how the oom killer really works and chooses
tasks. I understand that it differs from the previously allowed behavior,
but those userspace applications need to be fixed if, for no other reason,
they are now consistent with how the oom killer kills tasks. I think
that's a very worthwhile goal and the cost of moving to a new interface
such as /proc/pid/oom_adj_child to have the same inheritance property that
was available in the past is justified.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/