Re: [Lse-tech] [patch] sched-domain cleanups, sched-2.6.5-rc2-mm2-A3

From: Nick Piggin
Date: Tue Mar 30 2004 - 02:27:10 EST


Andi Kleen wrote:
On Tue, 30 Mar 2004 17:03:42 +1000
Nick Piggin <nickpiggin@xxxxxxxxxxxx> wrote:


So it is very likely to be a case of the threads running too
long on one CPU before being balanced off, and faulting in
most of their working memory from one node, right?


Yes.


I think it is impossible for the scheduler to correctly
identify this and implement the behaviour that OpenMP wants
without causing regressions on more general workloads
(Assuming this is the problem).


Regression on what workload? The 2.4 kernel who did the
early balancing didn't seem to have problems.


No, but hopefully sched domains balancing will do
better than the old numasched.


I have NUMA API for an application to select memory placement
manually, but it's unrealistic to expect all applications to use it,
so the scheduler has to do at least an reasonable default.

In general on Opteron you want to go as quickly as possible
to your target node. Keeping things on the local node and hoping
that threads won't need to be balanced off is probably a loss.
It is quite possible that other systems have different requirements,
but I doubt there is a "one size fits all" requirement and doing a custom domain setup or similar would be fine for me.

It is the same situation with all NUMA, obviously Opteron's
1 CPU per node means it is sensitive to node imbalances.

(or at least if sched domain cannot be tuned for Opteron then
it would have failed its promise of being a configurable scheduler)

Well it seems like Ingo is on to something. Phew! :)


I suspect this would still be a regression for other tests
though where thread creation is more frequent, threads share
working set more often, or the number of threads > the number
of CPUs.


I can try such tests if they're not too time consuming to set up.
What did you have in mind?


Not really sure. I guess probably most things that use a
lot of threads, maybe java, a web server using per connection
threads (if there is such a thing).

On the other hand though, maybe it will be a good idea if it
is done carefully...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/