Re: [BUGFIX][PATCH] oom-kill: fix NUMA consraint check withnodemask v4.2

From: Daisuke Nishimura
Date: Tue Nov 17 2009 - 20:49:06 EST


Hi.

On Tue, 17 Nov 2009 16:11:58 -0800 (PST), David Rientjes <rientjes@xxxxxxxxxx> wrote:
> On Wed, 11 Nov 2009, KAMEZAWA Hiroyuki wrote:
>
> > Fixing node-oriented allocation handling in oom-kill.c
> > I myself think this as bugfix not as ehnancement.
> >
> > In these days, things are changed as
> > - alloc_pages() eats nodemask as its arguments, __alloc_pages_nodemask().
> > - mempolicy don't maintain its own private zonelists.
> > (And cpuset doesn't use nodemask for __alloc_pages_nodemask())
> >
> > So, current oom-killer's check function is wrong.
> >
> > This patch does
> > - check nodemask, if nodemask && nodemask doesn't cover all
> > node_states[N_HIGH_MEMORY], this is CONSTRAINT_MEMORY_POLICY.
> > - Scan all zonelist under nodemask, if it hits cpuset's wall
> > this faiulre is from cpuset.
> > And
> > - modifies the caller of out_of_memory not to call oom if __GFP_THISNODE.
> > This doesn't change "current" behavior. If callers use __GFP_THISNODE
> > it should handle "page allocation failure" by itself.
> >
> > - handle __GFP_NOFAIL+__GFP_THISNODE path.
> > This is something like a FIXME but this gfpmask is not used now.
> >
>
> Now that we're passing the nodemask into the oom killer, we should be able
> to do more intelligent CONSTRAINT_MEMORY_POLICY selection. current is not
> always the ideal task to kill, so it's better to scan the tasklist and
> determine the best task depending on our heuristics, similiar to how we
> penalize candidates if they do not share the same cpuset.
>
> Something like the following (untested) patch. Comments?
I agree to this direction.

Taking into account the usage per node which is included in nodemask might be useful,
but we don't have per node rss counter per task now and it would add some overhead,
so I think this would be enough(at leaset for now).

Just a minor nitpick:

> @@ -472,7 +491,7 @@ void mem_cgroup_out_of_memory(struct mem_cgroup *mem, gfp_t gfp_mask)
>
> read_lock(&tasklist_lock);
> retry:
> - p = select_bad_process(&points, mem);
> + p = select_bad_process(&points, mem, NULL);
> if (PTR_ERR(p) == -1UL)
> goto out;
>
need to pass "CONSTRAINT_NONE" too.


Thanks,
Daisuke Nishimura.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/