Re: Regression in 2.6.23-rc2-mm2, mounting cpusets causes a hang

From: Lee Schermerhorn
Date: Wed Aug 15 2007 - 09:43:59 EST


On Tue, 2007-08-14 at 14:56 -0700, Christoph Lameter wrote:
> On Tue, 14 Aug 2007, Lee Schermerhorn wrote:
>
> > > Ok then you did not have a NUMA system configured. So its okay for the
> > > dummies to ignore the stuff. CONFIG_NODES_SHIFT is a constant and does not
> > > change. The first bit is always set.
> >
> > The first bit [node 0] is only set for the N_ONLINE [and N_POSSIBLE]
> > mask. We could add the static init for the other masks, but since
> > non-numa platforms are going through the __build_all_zonelists, they
> > might as well set the MEMORY bits explicitly. Or, maybe you'll
> > disagree ;-).
>
> The bitmaps can be completely ignored if !NUMA.
>
> In the non NUMA case we define
>
> static inline int node_state(int node, enum node_states state)
> {
> return node == 0;
> }
>
> So its always true for node 0. The "bit" is set.

The issue is with the N_*_MEMORY masks. They don't get initialized
properly because node_set_state() is a no-op if !NUMA. So, where we
look for intersections with or where we AND with the N_*_MEMORY masks we
get the empty set.

>
> We are trying to get cpusets to work with !NUMA?


Well, yes. In Serge's case, he's trying to use cpusets with !NUMA.
He'll have to comment on the reasons for that. Looking at all of the
#ifdefs and init/Kconfig, CPUSET does not depend on NUMA--only SMP and
CONTAINERS [altho' methinks CPUSET should select CONTAINERS rather than
depend on it...]. So, you can use cpusets to partition of cpus in
non-NUMA configs.

In the more general case, tho', I'm looking at all uses of the
node_online_map and for_each_online_node, for instances where they
should be replaced with one of the *_MEMORY masks. IMO, generic code
that is compiled independent of any CONFIG option, like NUMA, should
just work, independent of the config. Currently, as Serge has shown,
this is not the case. So, I think we should fix the *_MEMORY maps to be
correctly populated in both the NUMA and !NUMA cases. A couple of
options:

1) just use node_set() when populating the masks,

2) initialize all masks to include at least cpu/node 0 in the !NUMA
case.

Serge chose #1 to fix his problem. I followed his lead to fix the other
2 places where node_set_state() was being used to initialize the NORMAL
memory node mask and the CPU node mask. This will add a few unnecessary
instructions to !NUMA configs, so we could change to #2.

Thoughts?

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/