Re: [patch 2/2] cpusets: add interleave_over_allowed option

From: David Rientjes
Date: Tue Oct 30 2007 - 19:26:20 EST


On Tue, 30 Oct 2007, Paul Jackson wrote:

> We've already got two Choices, one released and one in the oven. Is
> there an actual, real world situation, motivating this third Choice?
>

Let's put Choice C into the lower oven, then.

Of course there's actual and real world examples of this, because right
now we're not meeting the full intent of the application. Cpusets deal
with cpus and memory, they don't have anything to do with affinity to
particular I/O devices; that part is left up to the creator of the cpuset
to sort out correctly based on their system topology.

If my application does tons of I/O on one particular device to which my
memory has access, I can use MPOL_PREFERRED to prefer the memory be
allocated on a node with the best affinity to my device. If cpusets
change my access to that node, I'm still using an MPOL_PREFERRED policy
with a remapped node that no longer has affinity to that device because
nodes_remap() doesn't take that into account. My preference would be to
fallback to MPOL_DEFAULT behavior, since it's certainly plausible that
other cpusets share the same node, instead of unnecessarily filling up a
node that I don't even prefer anymore.

Same situation exists of MPOL_INTERLEAVE policies where my NUMA
optimization is no longer helpful because I'm interleaving over a set of
nodes that was simply remapped and their affinity (which isn't guaranteed
to be unifom) wasn't even taken into account.

But, with Choice C, my intent is still preserved in the mempolicy even
though it's not effected because my access rights to the node has changed.
If I get access to that node back later, and I haven't issued subsequent
set_mempolicy() calls to change my policy, my MPOL_PREFERRED or
MPOL_INTERLEAVE policy is again effected and I then benefit from my NUMA
optimization once again.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/