Re: [2.6.24-rc8-mm1][regression?] numactl --interleave=all doesn'tworks on memoryless node.

From: Paul Jackson
Date: Tue Feb 05 2008 - 05:18:30 EST


Lee wrote:
> I don't know the current state of Paul's rework of cpusets and
> mems_allowed. That probably resolves this issue, if he still plans on
> allowing a fully populated mask to indicate interleaving over all
> allowed nodes.

It got a bit stalled out for the last month (my employer had other
designs on my time.) But I'd really like to drive it home.

What happened so far, in December 2007 and earlier, is that a few of us:

David Rientjes <rientjes@xxxxxxxxxx>
Lee.Schermerhorn@xxxxxx
Christoph Lameter <clameter@xxxxxxx>
Andi Kleen <ak@xxxxxxx>

had a discussion, motivated in good part by the need to allow a
mempolicy of MPOL_INTERLEAVE over all nodes currently available in
the cpuset, where that interleave policy was robustly preserved if
the cpuset changed (without requiring the application to somehow
"know" its cpuset had changed and reissuing the set_mempolicy call.)

But that discussion touched on some other long standing deficiencies
in the way that I had originally glued cpusets and memory policies
together. The current mechanism doesn't handle changing cpusets very
well, especially if the number of nodes in the cpuset increases.

Obviously, I can't change the current behaviour, especially of the
mempolicy system calls. I can only add new options that provide new
alternatives.

The patchset I'd like to drive home addresses these issues with a
couple of additional MPOL_* flags, upward compatible, that alter the
way that nodemasks are mapped into cpusets, and remapped if the cpuset
subsequently changes.

The next two steps I need to take are:
1) propose this patch, with careful explanation (it's easy to lose
one's bearings in the mappings and remappings of node numberings)
to a wider audience, such as linux-mm or linux-kernel, and
2) carefully test this, especially on each code path I touched in
mm/mempolicy.c, where the changes were delicate, to ensure I
didn't break any existing code.

There were also some other, smaller patches proposed, by myself and
others. I was preferring to address a wider set of the long standing
issues in this area, but the others above mostly preferred the smaller
patches. This needs to be discussed in a wider forum, and a concensus
reached.

Hopefully this week or next, I will publish this patch proposal.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@xxxxxxx> 1.940.382.4214
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/