Re: [patch 3/3] cpusets: add memory_spread_user option

From: Lee Schermerhorn
Date: Fri Oct 26 2007 - 13:44:40 EST


On Fri, 2007-10-26 at 10:18 -0700, Paul Jackson wrote:
> pj wrote:
> > On a different point, we could, if it was worth the extra bit of code,
> > improve the current code's handling of mempolicy rebinding when the
> > cpuset adds memory nodes. If we kept both the original cpusets
> > mems_allowed, and the original MPOL_INTERLEAVE nodemask requested by
> > the user in a call to set_mempolicy, then we could rebind (nodes_remap)
> > the currently active policy v.nodes using that pair of saved masks to
> > guide the rebinding. This way, if say a cpuset shrunk, then regrew back
> > to its original size (original number of nodes) we would end up
> > replicating the original MPOL_INTERLEAVE request, cpuset relative.
> > This would provide a more accurate cpuset relative translation of such
> > memory policies with-out- changing the set_mempolicy API. Hmmm ... this
> > might meet your needs entirely, so that we did not need -any- added
> > flags to the API.
>
> Thinking about this some more ... there's daylight here!
>
> I'll see if I can code up a patch for this now, but the idea is
> to allow user code to specify any nodemask to a set_mempolicy
> MPOL_INTERLEAVE call, even including nodes not in their cpuset,
> and then (1) use nodes_remap() to fold that mask down to whatever
> is their current cpuset (2) remember what they passed in and use
> it again with nodes_remap() to re-fold that mask down, anytime
> the cpuset changes.
>
> For example, if they pass in a mask with all bits sets, then they
> get interleave over all the nodes in their current cpuset, even as
> that cpuset changes. If they pass in a mask with say just two
> bits set, then they will get interleave over just two nodes anytime
> they are in a cpuset with two or more nodes (when in a single node
> cpuset, they will of course get no interleave, for lack of anything
> to interleave over.)
>
> This should replace the patches that David is proposing here. It should
> replace what Lee is proposing. It should work with libnuma and be
> fully upward compatible with current code (except perhaps code that
> depends on getting an error from requesting MPOL_INTERLEAVE on a node
> not allowed.)
>
> And instead of just covering the special case of "interleave over all
> available nodes" it should cover the more general case of interleaving
> over any subset of nodes, folded or replicated to handle being in any
> cpuset.

Will it handle the case of MPOL_INTERLEAVE policy on a shm segment that
is mapped by tasks in different, possibly disjoint, cpusets. Local
allocation does, and my patch does. That was one of the primary
goals--to address an issue that Christoph has with shared policies.
cpusets really muck these up!

Lee
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/