Re: [BUGFIX for 2.6.36][RESEND][PATCH 1/2] oom: remove totalpagenormalization from oom_badness()

From: David Rientjes
Date: Tue Sep 07 2010 - 23:22:00 EST


On Wed, 8 Sep 2010, KOSAKI Motohiro wrote:

> > > ok, this one got no objection except original patch author.
> >
> > Would you care to respond to my objections?
> >
> > I replied to these two patches earlier with my nack, here they are:
> >
> > http://marc.info/?l=linux-mm&m=128273555323993
> > http://marc.info/?l=linux-mm&m=128337879310476
> >
> > Please carry on a useful debate of the issues rather than continually
> > resending patches and labeling them as bugfixes, which they aren't.
>
> You are still talking about only your usecase. Why do we care you? Why?

It's an example of how the new interface may be used to represent oom
killing priorities for an aggregate of tasks competing for the same set of
resources.

> Why don't you fix the code by yourself? Why? Why do you continue selfish
> development? Why? I can't understand.
>

I can only reiterate what I've said before (and you can be assured I'll
only keep it technical and professional even though you've always made
this personal with me): current users of /proc/pid/oom_adj only polarize a
task to either disable oom killing (-17 or -16), or always prefer a task
(+15). Very, very few users tune it to anything in between, and when it's
done, it's relative to other oom_adj values.

A single example of a /proc/pid/oom_adj usecase has not been presented
that shows anybody using it as a function of either an application's
expected memory usage or of the system capacity. Those two variables are
important for oom_adj to make any sense since its old definition was
basically oom_adj = mm->total_vm << oom_adj for positive oom_adj and
oom_adj = mm->total_vm >> oom_adj for negative oom_adj. If an
application, system daemon, or job scheduler does not tune it without
consideration to the amount of expected RAM usage or system RAM capacity,
it doesn't make any sense. You're welcome to present such a user at this
time.

That said, I felt it was possible to use the current usecase for
/proc/pid/oom_adj to expand upon its applicability by introducing
/proc/pid/oom_score_adj with a much higher resolution and ability to stay
static based on the relative importance of a task compared to others
sharing the same resources in a dynamic environment (memcg limits
changing, cpuset mems added, mempolicy nodes changing, etc).

Thus, my introduction of oom_score_adj causes no regression for real-world
users of /proc/pid/oom_adj and allows users of cgroups and mempolicies a
much more powerful interface to tune oom killing priority.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/