Re: [Lse-tech] [PATCH] cpusets - big numa cpu and memory placement
From: Hubertus Franke
Date: Thu Oct 14 2004 - 18:04:19 EST
Paul, there are also other means for gang scheduling then having
to architect a tightly synchronized global clock into the communication
device.
Particularly, in a batch oriented environment of compute intensive
applications, one does not really need/want to switch frequently.
Often, the communication devices are memory mapped straight into the
application OS involvement with limited available channels.
However, as shown in previous work, gang scheduling and other forms of
scheduling tricks (e.g. backfilling) can provide for significant higher
utilization. So, if a high context switching rate (read interactivity)
is not required, then a user space daemon scheduling network can be used.
We have a slew of pubs on this. An example readup can be obtained here:
Y. Zhang, H. Franke, J. Moreira, A. Sivasubramaniam. Improving Parallel
Job Scheduling by Combining Gang Scheduling and Backfilling Techniques.
In Proceedings of the International Parallel and Distributed Processing
Symposium (IPDPS), pages 113-142 May 2000.
http://www.cse.psu.edu/~anand/csl/papers/ipdps00.pdf
Or for a final sum up of that research as a journal.
Y. Zhang, H. Franke, J. Moreira, A. Sivasubramaniam. An Integrated
Approach to Parallel Scheduling Using Gang-Scheduling, Backfilling and
Migration. IEEE Transactions on Parallel and Distributed Systems,
14(3):236-247, March 2003.
This was implemented for the IBM SP2 cluster and ASCI machine at
Livermore National Lab in the late 90's.
If you are interested in short scheduling cycles we also discovered that
dependent on the synchronity of the applications gang scheduling is not
necessarily the best.
Y. Zhang, A. Sivasubramaniam, J. Moreira, H. Franke. A Simulation-based
Study of Scheduling Mechanisms for a Dynamic Cluster Environment. In
Proceedings of the ACM International Conference on Supercomputing (ICS),
pages 100-109, May 2000. http://www.cse.psu.edu/~anand/csl/papers/ics00a.pdf
If I remember correctly this tight gang scheduling based on slots was
already implemented on IRIX in 95/96 ( read a paper on that ).
Moral of the story here is that its unlikely that Linux will support
gang scheduling in its core anytime soon or will allow network adapters
to drive scheduling strategies. So likely these are out.
An less frequent gang scheduling can be implemented with user level
daemons, so an adequate solution is available for most instances.
-- Hubertus
Paul Jackson wrote:
Kevin McMahon <n6965@xxxxxxx> pointed out to me a link to an interesting
article on gang scheduling:
http://www.linuxjournal.com/article.php?sid=7690
Issue 127: Improving Application Performance on HPC Systems with Process Synchronization
Posted on Monday, November 01, 2004 by Paul Terry Amar Shan Pentti Huttunen
It's amazingly current - won't even be posted for another couple of weeks ;).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/