Re: 2.6.5-rc3-mm4 x86_64 sched domains patch

From: Nick Piggin
Date: Thu Apr 08 2004 - 18:43:02 EST




Darren Hart wrote:

The current default implementations of arch_init_sched_domains
constructs either a flat or two level topolology. The two level
topology is built if CONFIG_NUMA is set. It seems that CONFIG_NUMA is
not the appropriate flag to use for constructing a two level topology
since some architectures which define CONFIG_NUMA would be better served
with a flat topology. x86_64 for example will construct a two level
topology with one CPU per node, causing performance problems because
balancing within nodes is pointless and balancing across nodes doesn't
occur as often.



This is correct, although I don't know why there would be
performance problems. The rebalance in the degenerate node-local
domain should be basically unmeasurable. It would be nice to
get rid of it at some time. I have code to prune off degenerate
domains, which I will submit soonish.

The NUMA rebalance should occur more often than the old numasched
did, but perhaps with some recent Altix-centric changes to the
generic setup, this is no longer the case.

The STREAM performance problem is due mainly to the more
conservative nature of balancing, which is otherwise a good thing.
I think we can fix this in the short term by having x86_64 balance
between nodes more often. In the long term, we can merge Ingo's
balance on clone stuff, and the interested people can play with
that.

This patch introduces a new CONFIG_SCHED_NUMA flag and uses it to decide
between a flat or two level topology of sched_domains. The patch is
minimally invasive as it primarily modifies Kconfig files and sets the
appropriate default (off for x86_64, on for everything that used to
export CONFIG_NUMA) and should only change the sched_domains topology
constructed on x86_64 systems. I have verified this on a 4 node x86
NUMAQ, but need someone to test x86_64.



I guess I can't see a big problem with this, other than more
complexity. In the long run, we should obviously have the arch
code set up optimal domains depending on the machine and config.

This patch is intended as a quick fix for the x86_64 problem, and
doesn't solve the problem of how to build generic sched domain
topologies. We can certainly conceive of various topologies for x86
systems, so even arch specific topologies may not be sufficient. Would
sub-arch (ie NUMAQ) be the right way to handle different topologies, or
will we be able to autodiscover the appropriate topology? I will be
looking into this more, but thought some might benefit from an immediate
x86_64 fix. I am very interested in hearing your ideas on this.



SGI want to do sub arch domains so they can do specific things
with their systems. I don't really care what the arch code does
with them, but it would be wise to only specialise it when there
is a genuine need. I'm glad you'll be looking into it, thanks.

Nick

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/