Re: [BUGFIX][PATCH] oom-kill: fix NUMA consraint check with nodemask v3

From: KOSAKI Motohiro
Date: Tue Nov 10 2009 - 22:24:00 EST


> On Wed, 11 Nov 2009, KOSAKI Motohiro wrote:
>
> > > > {
> > > > -#ifdef CONFIG_NUMA
> > > > struct zone *zone;
> > > > struct zoneref *z;
> > > > enum zone_type high_zoneidx = gfp_zone(gfp_mask);
> > > > - nodemask_t nodes = node_states[N_HIGH_MEMORY];
> > > > + int ret = CONSTRAINT_NONE;
> > > >
> > > > - for_each_zone_zonelist(zone, z, zonelist, high_zoneidx)
> > > > - if (cpuset_zone_allowed_softwall(zone, gfp_mask))
> > > > - node_clear(zone_to_nid(zone), nodes);
> > > > - else
> > > > + /*
> > > > + * The nodemask here is a nodemask passed to alloc_pages(). Now,
> > > > + * cpuset doesn't use this nodemask for its hardwall/softwall/hierarchy
> > > > + * feature. mempolicy is an only user of nodemask here.
> > > > + */
> > > > + if (nodemask) {
> > > > + nodemask_t mask;
> > > > + /* check mempolicy's nodemask contains all N_HIGH_MEMORY */
> > > > + nodes_and(mask, *nodemask, node_states[N_HIGH_MEMORY]);
> > > > + if (!nodes_equal(mask, node_states[N_HIGH_MEMORY]))
> > > > + return CONSTRAINT_MEMORY_POLICY;
> > > > + }
> > >
> > > Although a nodemask_t was previously allocated on the stack, we should
> > > probably change this to use NODEMASK_ALLOC() for kernels with higher
> > > CONFIG_NODES_SHIFT since allocations can happen very deep into the stack.
> >
> > No. NODEMASK_ALLOC() is crap. we should remove it.
>
> I've booted 1K node systems and have found it to be helpful to ensure that
> the stack will not overflow especially in areas where we normally are deep
> already, such as in the page allocator.

Linux doesn't support 1K nodes. (and only SGI huge machine use 512 nodes)

At least, NODEMASK_ALLOC should make more cleaner interface. current one
and struct nodemask_scratch are pretty ugly.


> > btw, CPUMASK_ALLOC was already removed.
>
> I don't remember CPUMASK_ALLOC() actually being merged. I know the
> comment exists in nodemask.h, but I don't recall any CPUMASK_ALLOC() users
> in the tree.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/