rsdl v46 report,numbers,comments

From: Mike Mattie
Date: Tue Apr 24 2007 - 14:26:39 EST


--Sig_RdAvkpxAtqcJfnIE8Vs3nMF
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

Hello,

0. intro

I am very happy to report that v46 of RSDL subjectively is much better than=
v42. As you (Con Kolivas) might=20
remember from a previous mail I was experimenting with using nice levels ef=
fectively. I have refined these=20
levels to this layout:

-2 : clock (ntpd)
-1 : syslog,sshd,X
0 : command; default for shells
1 : audacious (audio), xfce window manager (with compositor on )
2 : emacs (SCHED_OTHER), desktop/window manager infrastructure (dbus), s=
sh-agent , bind (batch scheduled )
3 : desktop applications (mail , xchat, openoffice )
5 : spamd,batch scheduled compiles/test-suites.
10 : cron jobs

1. Some numbers

My machine is a particularly tough case I think. A uni-processor Athlon XP =
3000+ (involuntary pre-empt) with a=20
software RAID5 on PATA drives. I load it heavily with compiles/test-suites,=
and I am very sensitive to audio=20
glitches.=20

here are some stats for idle:

---load-avg--- ------memory-usage----- ----total-cpu-usage---- ----interrup=
ts--- ---system--
_1m_ _5m_ 15m_|_used _buff _cach _free|usr sys idl wai hiq siq|__17_ __18_ =
__20_|_int_ _csw_
0.2 0.2 0.2| 170M 15M 309M 6560k| 2 1 94 4 0 0| 1 7 =
150 | 238 208=20
0.2 0.2 0.2| 170M 15M 309M 6568k| 1 0 99 0 0 0| 0 0 =
0 | 76 55=20
0.2 0.2 0.2| 170M 15M 309M 6568k| 0 1 99 0 0 0| 0 0 =
0 | 75 47=20
0.2 0.2 0.2| 170M 15M 309M 6624k| 4 0 96 0 0 0| 0 0 =
0 | 75 37=20
0.2 0.2 0.2| 170M 15M 309M 6624k| 1 0 99 0 0 0| 0 0 =
0 | 75 36=20

here are some stats for music playing:

---load-avg--- ------memory-usage----- ----total-cpu-usage---- ----interrup=
ts--- ---system--
_1m_ _5m_ 15m_|_used _buff _cach _free|usr sys idl wai hiq siq|__17_ __18_ =
__20_|_int_ _csw_
0.9 0.4 0.2| 175M 15M 305M 5652k| 2 1 94 4 0 0| 1 7 =
150 | 238 210=20
0.9 0.4 0.2| 175M 15M 305M 5652k| 10 1 89 0 0 0| 0 3 =
989 |1068 1510=20
0.9 0.4 0.2| 175M 15M 305M 5592k| 13 0 87 0 0 0| 0 3 =
1013 |1093 1565=20
0.9 0.4 0.2| 175M 15M 304M 6300k| 11 1 88 0 0 0| 0 3 =
1000 |1078 1496=20
0.9 0.4 0.2| 175M 15M 305M 6300k| 13 0 87 0 0 0| 0 3 =
1006 |1084 1509=20
0.8 0.4 0.2| 175M 15M 305M 6180k| 13 1 86 0 0 0| 0 3 =
1000 |1078 1524=20
0.8 0.4 0.2| 175M 15M 305M 6060k| 12 1 87 0 0 0| 0 3 =
1000 |1078 1564=20

The context switches are high, but so are the interrupts (USB 2.0 Audigy NX)

To see how effective using these nice levels were I decided to play with rr=
_interval, on the theory
that with priorities strictly enforced and used aggressively that a longer =
time-slice would not
cause audio delay. So far that theory is holding. All of these numbers are =
with rr_internal =3D 20, and
I have less audio problems than any previous kernel/tuning setup.

That is very impressive.

as far as batch loading goes I tried a kernel compile. These numbers look n=
ice for RSDL but there are
some caveats:

kernel compile , CFS v3 : make 756.83s user 89.37s sys=
tem 58% cpu 24:08.21 total
kernel compile , v46 rr_interval =3D default : make 754.66s user 89.74s s=
ystem 59% cpu 23:35.38 total
kernel compile , v46 rr_interval =3D 20 : make 682.83s user 84.34s s=
ystem 73% cpu 17:29.57 total

1. The system was noisy. I did this intentionally. My typical load is a mix=
ture of desktop/compile.
All three numbers were generated while listening to music, reading docs/=
web/news, using emacs etc.
with each of the compiles I tried running a visualization plugin (Projec=
tM inside audacious ) for
a minute or so.

This skews the numbers for comparison , but I was looking for an impress=
ion that was based off a
*real* work-load.=20

It would like to add as well that before RSDL the mainline scheduler fai=
led completely at running=20
ProjectM even when it was the only application on the desktop. ( It stal=
led for seconds with a rock steady period ).

2. All of these ran nice 5 sched: BATCH

3. I have the xfce compositor turned on, using the transparency.

4. compiled on software RAID 5 (md) -> dev mapper -> lvm2 -> ext3 , 4 drive=
s, write-cache disabled,
external 512 mg flash drive for a external journal , commit=3D15, journa=
l=3Ddata

=46rom the caveats above , especially the deep stack for the block layer, plu=
s meeting audio deadlines
while sharing a interrupt with the journal drive (arghh) this is very impre=
ssive system behavior for me.

Here is the stats for doing a kernel compile with audacious running, plus m=
ail,editor etc.

---load-avg--- ------memory-usage----- ----total-cpu-usage---- ----interrup=
ts--- ---system--
_1m_ _5m_ 15m_|_used _buff _cach _free|usr sys idl wai hiq siq|__17_ __18_ =
__20_|_int_ _csw_
1.3 1 0.8| 198M 22M 269M 11M| 3 1 92 4 0 0| 1 7 =
199 | 287 348=20
1.3 1 0.8| 204M 22M 269M 6072k| 79 12 0 9 0 0| 0 7 =
1003 |1087 2160=20
1.3 1 0.8| 195M 22M 268M 16M| 82 18 0 0 0 0| 0 8 =
1003 |1085 2703=20
1.3 1 0.8| 200M 22M 268M 10M| 82 16 0 2 0 0| 0 8 =
1009 |1094 2204=20
1.4 1 0.8| 195M 22M 269M 15M| 83 15 0 2 0 0| 0 8 =
1014 |1099 3007=20
1.4 1 0.8| 200M 22M 269M 9488k| 82 14 0 4 0 0| 0 7 =
1000 |1082 2361=20
1.4 1 0.8| 200M 22M 267M 12M| 83 15 0 2 0 0| 0 7 =
1000 |1085 2579=20


Now for some comments from the peanut gallery.

2. Window Manager scheduler hinting ?

On reflection my workload may be the easy case. As a developer I run a
somewhat small number of applications, typically the lightest I can find, e=
xcept emacs :)

A more typical desktop user might not be able to use my sort of setup, wher=
e I can push
a batchy job down in priority and wait for it. I also write shell functions=
, aliases etc=20
to set this up, which is easy for a distro, but not necessarily average use=
r usable.
For the users where they are running multiple monolithic CPU hog programs, =
like openoffice,firefox etc=20
This sort of approach won't suit them.

However the strict enforcement of RSDL could be leveraged for the desktop u=
ser as well. The Mac OSX
scheduler has layered on-top of the typical nice priority levels the concep=
t of foreground and background
scheduling. Basically the Mac window manager can tune the scheduling based =
on window focus.

I think something like this combined with RSDL could be a worthy experiment=
. If the window manager can
calculate the "attention" a user gives a window then it could nice it up/do=
wn within a small range.
Mac OS X has a nasty behavior of being jerky when switching focus under loa=
d. I think this is due to
a simplistic knee-jerk response to window focus in scheduling (or my ibook =
has to little RAM).

If a linux window manager were to rank the attention of windows, and be sma=
rt about cycling between
groups of apps I think three priority levels could be used like this:

1 : foreground ( frequent attention )
2 : background ( infrequent attention )
3 : batchy ( downloaders, other long running infrequently monitored progra=
ms )

Think of how easy this is for a window-manager to compute, compared to tryi=
ng to re-build the
information in-kernel with heuristics.

If this idea is actually pursued there may need to be a new feature in RSDL=
. With this scheme it is very important
to ensure that a particular nice level does not become overloaded ( think f=
oreground ) . The current linux schedulers
report a load value for the total system. This scheme needs to know the loa=
d value for a individual nice level as well,
that way the foreground nice level could remain responsive by worst case ki=
cking a program down a level or two if it
starts becoming unresponsive.

3. Better throughput

I think that this mixed developer work-load is actually the worst case for =
a scheduler. It has to meet deadlines
and provide decent throughput. Beyond pre-empt and clock precise scheduling=
I am not sure if there is much more
that can be done for interactive.

I do think that SCHED_BATCH provides alot of room for interesting ideas tho=
ugh since the guarantees are so loose.
As I understand it SCHED_BATCH is guaranteed to not starve and that is abou=
t it.

Since I am commenting freely here is a idea to be taken with a huge grain o=
f salt. Is it possible that
the scheduler could compute and combine the deadlines for both audio/video =
? If the scheduler can compute
the longest interval between both video/audio refresh then scheduling could=
be arranged like so:

refresh -> interactive -> batch -> refresh

The interactive processes would run first, that way the risk of missing a r=
efresh would be minimized. Once
the scheduler has ran all the interactive stuff, for the case of a small se=
t of programs such
as audio player and editor, it would be very likely that alot of time is le=
ft.

Next assume that the SCHED_BATCH has been sorted into CPU intensive and IO =
intensive. For the CPU intensive
it would be nice if the scheduler would give it a massive time-slice, why n=
ot all the time until the
next refresh point ? Basically reduce the context-switching to mostly inter=
rupts/background noise.=20
The SCHED_BATCH programs may take longer to run, as they are being interlea=
ved more than balanced, but I think it's=20
possible that overall throughput could be increased considerably. If someth=
ing like this could be done while
still honoring the nice values (though not as strictly as for interactive p=
rograms ) it would be a big win.
With huge time-slices other parts of the system such as VM management might=
behave more efficiently as well.

I think linux would be quite special if it was the best in throughput effic=
iency (ignoring completion
time, just how much processor etc used to run the same work-load ) for SETI=
like work-loads while still=20
running a fully responsive interactive desktop.

btw, the above concept is articulated from a distant background of programm=
ing a VGA adapter on a 286.
That the last time I dealt with hard-deadlines hands on. I haven't had=
a reason to code at bare-metal=20
since I started using linux so please consider it a vehicle for articu=
lating a concept.=20

4. Outro

In summary I like the RSDL scheduler quite a bit. It is consistent and does=
n't do magic so I can build a
priority scheme on-top of it with a very compact and reliable behavior mode=
l. Using the priority levels
seems to allow me to use larger time-slices without sacrificing interactivi=
ty. This is unsuprising as
I am actually telling the scheduler what I want ......

I think that the window manager can use simple algorithms to calculate what=
the kernel would have to guess
at with hairy heuristics. Hacking nice throttling into the window manager c=
ombined with a very simple
but reliable scheduler may work pretty well for desktop users. Maybe that w=
ill excite someone enough to
go try it, or dig up some existing implementation (other than OSX).

I also think that SCHED_BATCH is where alot of fun experiments can be playe=
d. Especially in regards to CPU
intensive programs. This combination is actually quite common I would think=
in audio/video production.

At this point with how well my system works the itch has been scratched as =
far as the in-kernel part goes.=20
I am interested though in playing around with your idlerun program though.=
=20

Later on , possibly much later I will cook up some better numbers/compariso=
ns. I really don't trust subjective
evaluations of scheduling, my own included. I think people really want a ne=
w kernel patch to work better, which=20
is a horrible way to start an evaluation. I want to measure both throughput=
, and interactivity in a double-blind
like way. (random option for grub ?)

With most of my work-load IO bound I expect the performance improvements to=
come from places like CFQ,ext4,syslet etc.

Thank you to all for a good kernel. Linux user-space is quite comfortable t=
hese days.

Cheers,
Mike Mattie - codermattie@xxxxxxxxx

--Sig_RdAvkpxAtqcJfnIE8Vs3nMF
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGLkvCdfRchrkBInkRAnT4AJ0VODRRKbwzgBYwhZFWdUX7+tVE8QCgk/6j
6cpa0sHwnVIabqIclCM7fkU=
=9RRq
-----END PGP SIGNATURE-----

--Sig_RdAvkpxAtqcJfnIE8Vs3nMF--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/