Re: 2.6.8-rc2-mm2 performance improvements (scheduler?)

From: Con Kolivas
Date: Tue Aug 10 2004 - 16:00:26 EST


Andrew Theurer wrote:
Also, one big change apparent to me, the elimination of
TIMESLICE_GRANULARITY.

Ah well I tuned the timeslice granularity and I can tell you it isn't quite
what most people think. The granularity when you get to greater than 4 cpus
is effectively _disabled_. So in fact, the timeslices are shorter in
staircase (in normal interactive=1, compute=0 mode which is how martin
would have tested it), not longer. But this is not the reason either since
in "compute" mode they are ten times longer and this also improves
throughput further.


Interesting, I forgot about the "* nr_cpus" that was in the granularity calculation. That does make me wonder, maybe the timeslices you are calculating could have something similar, but more appropriate.

Since the number of runnable tasks on a cpu should play a part in latency (the more tasks, potentially the longer the latency), I wonder if the timeslice would benefit from a modifier like " / task_cpu(p)->nr_running ". With this the base timeslice could be quite a bit larger to start for better cache warmth, and as we add more tasks to that cpu, the timeslices get smaller, so an acceptable latency is preserved.

I had a problem with fairness once I made the timeslices too long since that also determines priority demotion in the staircase design. That's why I have the "compute" mode as quite a separate entity because the longer timeslices on their own weren't of any special benefit (in my up to 8x testing but could be elsewhere) unless I added the delayed preemption which is probably where the main extra cache warmth comes from in "compute" design. Of course this comes at a cost which is higher latencies... because normal priority preemption is delayed.

Do you have cswitch data? I would not be surprised if it's a lot higher
on -no-staircase, and cache is thrashed a lot more. This may be
something you can pull out of the -no-staircase kernel quite easily.

Well from what I got on 8x the optimal load (-j x4cpus) and maximal load
(-j) on kernbench gives surprisingly similar context switch rates. It's
only when I enable compute mode that the context switches drop compared to
default staircase mode and mainline. You'd have to ask Martin and Rick
about what they got.


OK, thanks!

-Andrew Theurer

Cheers,
Con

Attachment: signature.asc
Description: OpenPGP digital signature