Re: [RFC][PATCH] O(1) Entitlement Based Scheduler

From: Peter Williams
Date: Sun Feb 29 2004 - 23:19:39 EST

Next message: Stephen Rothwell: "[PATCH] PPC64 iSeries virtual disk update"
Previous message: Benjamin LaHaise: "Re: couple of oopses in 2.6.3-gentoo"
In reply to: Andy Lutomirski: "Re: [RFC][PATCH] O(1) Entitlement Based Scheduler"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Andy Lutomirski wrote:

Peter Williams wrote:

Andy Lutomirski wrote:

How hard would it be to make shares hierarchial? For example (quoted names are just descriptive):

As Peter Chubb has stated such control is possible and is available on Tru64, Solaris and Windows with Aurema's (<http://www.aurema.com>) ARMTech product. The CKRM project also addresses this issue.

Cool. I hadn't realized ARMTech did that, and I haven't fully read up on CKRM.

Also, could interactivity problems be solved something like this:

prio = ( (old EBS usage ratio) - 0.5 ) * i + 0.5

"i" would be a per-process interactivity factor (normally 1, but higher for interactive processes) which would only boost them when their CPU usage is low. This makes interactive processes get their timeslices early (very high priority at low CPU consumption) but prevents abuse by preventing excessive CPU consumption. This could even by set by the (untrusted) process itself.

Interactive processes do very well under EBS without any special treatment.

Programs such as xmms aren't really interactive processes although they usually have a very low CPU usage rate like interactive processes. What distinguishes them is their need for REGULAR access to the CPU. It's unlikely that such a modification would help with the need for regularity.

I'm guessing the reason make -j16 broke it is because it (i.e. make) spawns lots of processes frequenly. Since make probably uses almost no CPU under this load, its usage per share stays near zero. As long as it is below that of xmms, then all of its children are too, at least until part-way into their first timeslices. The problem is that there are a bunch of these children, and, if enough start at once, they all run before xmms, and the audio buffer underruns. My approach would give xmms a better chance of running sooner (in fact, it would potentially have better priority than any non-interactive task until it started hogging CPU).

That's close to the reason. The real culprit is the dreaded "ramp up" which is our tag for the fact that it takes a short while (dependent on half life) for the scheduler to estimate the usage rate of a new process (or a sudden change in the usage rate of an existing process) so they get treated as low usage tasks for the first part of their life. In most cases this is a good thing as it causes commands run from the command line to get a boost and since a lot of these are very short lived (e.g. ls) they run very quickly and have very good response times.

In the case of a kernel build there are lots of C compiler tasks being launched and these run for several seconds but for the first few moments of their lives (during ramp up) they look like low CPU usage tasks and get high priority. This effect is short lived and doesn't effect normal interactive programs because regularity of access isn't important to them and the estimated usages of the C compiler tasks quickly exceeds the estimated usages of the interactive tasks. It only has an effect on xmms because regularity of access is important to it (i.e. xmms IS getting sufficient CPU but isn't always getting it as regularly as it likes).

The X server is a different case. Although it isn't an interactive program it does have an influence on interactive responsiveness when X windows is being used. Unfortunately, because it is generally serving a number of clients, its CPU usage can be quite high and it takes a little longer for the estimated usage rate of the C compiler tasks to exceed its estimated usage. For this reason, we recommend that sysadmins arrange for the X server to be run at somewhere between nice -9 and nice -15.

It should be obvious from the above that another way that interactive response can be improved is by shortening the half life. But there is a trade off with lowered overall system throughput resulting if the half life is too short. In the current version of EBS, root can change the half life on a running system and this functionality is there so that experiments into the effect of changing the half life on system performance can be conducted without the need to recompile the kernel and reboot the system. The default value of 5 seconds is one that we've found is good for overall performance and interactive response provided that the X server is run with an appropriate level of niceness.

Once again I'll stress that in order to cause xmms to skip we had to (on a single CPU machine) run a kernel build with -j 16 which causes a system load well in excess of 10 and is NOT a normal load. Under normal loads xmms performs OK.

I imagine that these two together would nicely solve most interactivity and fairness issues -- the former prevents starvation by other users and the latter prevents latency caused by large numbers of CPU-light tasks.

Is this sane?

Yes. Fairness between users rather than between tasks is a sane desire but beyond the current scope of EBS.

I have this strange masochistic desire to implement this. Don't expect patches any time soon -- it would be my first time playing with the scheduler ;)

Good luck
Peter
--
Dr Peter Williams, Chief Scientist peterw@xxxxxxxxxx
Aurema Pty Limited Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Stephen Rothwell: "[PATCH] PPC64 iSeries virtual disk update"
Previous message: Benjamin LaHaise: "Re: couple of oopses in 2.6.3-gentoo"
In reply to: Andy Lutomirski: "Re: [RFC][PATCH] O(1) Entitlement Based Scheduler"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]