Re: [Lse-tech] [PATCH] cpusets - big numa cpu and memory placement

From: Hubertus Franke
Date: Thu Oct 14 2004 - 18:04:19 EST


Paul, there are also other means for gang scheduling then having
to architect a tightly synchronized global clock into the communication device.

Particularly, in a batch oriented environment of compute intensive applications, one does not really need/want to switch frequently.
Often, the communication devices are memory mapped straight into the
application OS involvement with limited available channels.

However, as shown in previous work, gang scheduling and other forms of scheduling tricks (e.g. backfilling) can provide for significant higher utilization. So, if a high context switching rate (read interactivity) is not required, then a user space daemon scheduling network can be used.

We have a slew of pubs on this. An example readup can be obtained here:

Y. Zhang, H. Franke, J. Moreira, A. Sivasubramaniam. Improving Parallel Job Scheduling by Combining Gang Scheduling and Backfilling Techniques. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS), pages 113-142 May 2000.
http://www.cse.psu.edu/~anand/csl/papers/ipdps00.pdf

Or for a final sum up of that research as a journal.

Y. Zhang, H. Franke, J. Moreira, A. Sivasubramaniam. An Integrated Approach to Parallel Scheduling Using Gang-Scheduling, Backfilling and Migration. IEEE Transactions on Parallel and Distributed Systems, 14(3):236-247, March 2003.

This was implemented for the IBM SP2 cluster and ASCI machine at Livermore National Lab in the late 90's.

If you are interested in short scheduling cycles we also discovered that
dependent on the synchronity of the applications gang scheduling is not necessarily the best.

Y. Zhang, A. Sivasubramaniam, J. Moreira, H. Franke. A Simulation-based Study of Scheduling Mechanisms for a Dynamic Cluster Environment. In Proceedings of the ACM International Conference on Supercomputing (ICS), pages 100-109, May 2000. http://www.cse.psu.edu/~anand/csl/papers/ics00a.pdf

If I remember correctly this tight gang scheduling based on slots was already implemented on IRIX in 95/96 ( read a paper on that ).

Moral of the story here is that its unlikely that Linux will support gang scheduling in its core anytime soon or will allow network adapters to drive scheduling strategies. So likely these are out.
An less frequent gang scheduling can be implemented with user level daemons, so an adequate solution is available for most instances.

-- Hubertus

Paul Jackson wrote:

Kevin McMahon <n6965@xxxxxxx> pointed out to me a link to an interesting
article on gang scheduling:

http://www.linuxjournal.com/article.php?sid=7690
Issue 127: Improving Application Performance on HPC Systems with Process Synchronization
Posted on Monday, November 01, 2004 by Paul Terry Amar Shan Pentti Huttunen

It's amazingly current - won't even be posted for another couple of weeks ;).


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/