On Sat, Mar 17, 2007 at 10:41:01PM +0200, Avi Kivity wrote:
Well, the heuristic here is that process == job. I'm not sure heuristic is the right name for it, but it does point out a deficieny.
A cpu-bound process with many threads will overwhelm a cpu-bound single threaded threaded process.
A job with many processes will overwhelm a job with a single process.
A user with many jobs can starve a user with a single job.
I don't think the problem here is heuristics, rather that the scheduler's manages cpu quotas at the task level rather than at the user visible level. If scheduling were managed at all three hierarchies I mentioned ('job' is a bit artificial, but process and user are not) then:
- if N users are contending for the cpu on a multiuser machine, each should get just 1/N of available cpu power. As it is, a user can run a few of your #1 workloads (or a make -j 20) and slow every other user down
- your example would work perfectly (if we can communicate to the kernel what a job is)
- multi-threaded processes would not get an unfair advantage
I like this notion very much. I should probably mention pgrp's' typical
association with the notion of "job," at least as far as shells go.
One issue this raises is prioritizing users on a system, threads within
processes, jobs within users, etc. Maybe sessions would make sense, too,
and classes of users, and maybe whatever they call the affairs that pid
namespaces are a part of (someone will doubtless choke on the hierarchy
depth implied here but it doesn't bother me in the least). It's not a
deep or difficult issue. There just needs to be some user API to set the
relative scheduling priorities of all these affairs within the next higher
level of hierarchy, regardless of how many levels of hierarchy (aleph_0?).