Response testing on 2.6.0-test11 variants

From: Bill Davidsen
Date: Wed Dec 17 2003 - 15:51:45 EST


These are some results of variants of 2.6.0-test11 with my
responsiveness test. The responsiveness test checks to see how well the
system runs a small process which uses minimal resources (think more or
df) after the system has been doing other things for a while, 20sec by
default. This gives a hint how well the system will respond if you let a
shell window sit and then type a command, or look at a web page for a
bit and page down.

Details are in the README attached, the values of interest are the ratio
of the response time with some given load to the response time with no
load. All data were taken on a fresh booted system in a single xterm
window, with results scrolling to the screen as well as a file.

The two ratios are the raw average of average time, and the ratio of all
points within one S.D. of the median point. The 2nd reduces the effect
of a few bad results and seems more typical. Feel free to use either or
roll your own.

I included the values for a 2.4.20 kernel as well, for reference. I did
rerun the tests, 2.6 kernels really behave like that.

Test machine is a P-II 350, 96MB RAM, two dog-slow drives, with the temp
space on one so old it won't do DMA.
--
bill davidsen







TThhee rreesspp11 rreessppoonnssee bbeenncchhmmaarrkk
_B_i_l_l _D_a_v_i_d_s_e_n
_d_a_v_i_d_s_e_n_@_t_m_r_._c_o_m


IInnttrroodduuccttiioonn


The _r_e_s_p_1 benchmark is intended to measure system response
to trivial interactions when under reproducible loads. The
intent is to see how a system will respond to small requests
such as _l_s or uncovering a window in a window manager
environment. This will hopefully give some insight into how
the systems "feels" under load. See "how it works" for
details.


UUssiinngg tthhee bbeenncchhmmaarrkk


I use the benchmark to compare Linux kernels and tuning
parameters. I boot a system in single user mode with memory
limited to 256MB, and run the benchmark. I capture the
output by running with the _s_c_r_i_p_t command, and I usually run
with the -v (verbose) option. The output is formatted such
that you can get the base results by
grep '^ ' script.output
since all optional output is non-blank in column one.


The output data is the low and high response time (in ms.),
the median and the average, the standard deviation, and the
ratio of the average to the average of the noload reference
data.

After running the benchmark you might get a report like
this:

Starting 1 CPU run with 124 MB RAM, minimum 5 data points at 20 sec intervals

_____________ delay ms. ____________
Test low high median average S.D. ratio
noload 178.743 233.056 184.435 192.526 0.023 1.000
smallwrite 189.085 322.913 232.981 237.789 0.044 1.235
largewrite 187.308 1997.804 243.612 576.542 0.688 2.995
cpuload 178.544 259.127 179.017 196.635 0.035 1.021
spawnload 178.450 258.380 178.761 194.615 0.036 1.011
8ctx-mem 178.899 1571.854 182.922 463.700 0.620 2.409
2ctx-mem 178.917 5611.114 185.104 1540.410 2.351 8.001







resp1 response test Page 1









AAlltteerrnnaattiivvee ffoorrmmaatt


Newer versions of _r_e_s_p_1 produce a slightly different output,
deleting the S.D. (still in output with the -v option) and
adding the ratio of the 1sdavg of the test run to the run
with no load. The 1sdavg is defined as the average of all
points within one S.D. of the median, and tends to ignore
very large or small values which happen rarely. Some people
feel this is more useful.


Starting 1 CPU run with 91 MB RAM, minimum 5 data points at 20 sec intervals

_____________ delay ms. ____________ ___ Ratio ___
Test low high median average raw S.D.
noload 242.375 252.947 245.237 246.150 1.000 1.000
smallwrite 331.007 1840.243 344.869 746.631 3.033 1.933
largewrite 313.113 8067.703 922.303 2504.440 10.174 4.549
cpuload 362.457 467.524 369.683 387.730 1.575 1.502
spawnload 312.127 420.631 322.017 337.934 1.373 1.296
8ctx-mem 4918.723 16471.092 12337.373 11201.385 45.506 57.219
2ctx-mem 11951.979 19299.983 15779.539 15656.751 63.607 64.023



WWhhaatt ddooeess iitt mmeeaann??


Everyone can point to some value and say it is the "one real
value" which shows how well the configuration works. In
truth you can look at the ratio to see what the overall
effect is, or the high if you want to avoid "worst case"
issues, or the ratio of the median to the noload median, or
whatever else you think reflects how the configuration
really feels. Bear in mind this benchmark is trying to
identify just that, not best throughput or whatever else.


There are a few things you can always identify. First, if a
configuration has a large spread between the average and
median it will feel uneven, and second if the ratio is a
very large value, and I've seen values in the hundreds, with
that type of load your computer will go forth and conjugate
the verb "suck."











resp1 response test Page 2









AAbboouutt tthhee tteessttss


These are the loads which are run to test response. Each
generates demands on one or two resources.


+o noload
This just runs the response test a few times to get an
idea of how fast the system can respond when it has
nothing better to do. Hopefully the process will stay
in memory and only the CPU time for the system calls
will be evident. In the real world that's not always
the case, of course.


+o smallwrite
This allocates a buffer of 1MB and writes a file in
buffer size writes. The file size is five times the
system memory, which puts some pressure on the
buffering logic as well as the storage performance.
With the verbose option on the overall size of the data
write will be reported, in case this is useful.


+o largewrite
This is just like smallwrite, except it uses a buffer
which is (memsize-20)/2 MB in size. Does the o/s handle
large writes better than small? Worse? Does the o/s
actually swap pages of the buffer from which it's
writing in order to create disk cache? Some kernels are
hundreds of times worse than noload with this test.


+o cpuload
This just generates a bunch of CPU-bound processes,
Ncpu+1 of them, and they beat the CPU. By having more
processes than CPUs and using floating point load, I
can damage CPU affinity and thrash the cache.


+o spawnload
This repeatedly forks a process which fork/exec's a
shell, which runs a command. This creates tons of
process creates and cleanups. With the verbose flag the
number of loops per second is reported, each loop
representing two or three forks and two execs.


+o 8ctx-mem
This creates Ncpu+2 processes which each allocate
memory such that the total memory is about 120% of
physical. Then they pass a token in a circle using a
SysV message queue, causing context switching. There


resp1 response test Page 3









are eight trips around the loop for each access to a
new 2k page, and Ncpu tokens running around the circle.
With the -v option the child processes report the
circles per second rate, which might mean something
about the IPC and context switching.


+o 2ctx-mem
This is just like the 8ctx-mem test, but runs through
memory with a 1k stride, giving a new page accessed
every other trip of the token around the process
circle. Very exciting on a big SMP machine if you like
big numbers coming out of vmstat.



HHooww iitt wwoorrkkss


The main program starts a reference load as a child process,
then waits until the load "warms up," then enters a loop of
sleeping and doing a small interaction. This consists of
scanning an array which is allocated at startup and may be
swapped, and allocating and scanning an array which requires
free pages from the system. The scans are byte-by-byte, so
they use a small amount of CPU time, similar to small system
commands. The difference between the noload and load
response time is reported in ms.




























resp1 response test Page 4


2.4.20.out
Starting 1 CPU run with 93 MB RAM, minimum 5 data points at 20 sec intervals



_____________ delay ms. ____________ ___ Ratio ___

Test low high median average raw S.D.

noload 229.901 282.459 233.232 243.299 1.000 1.000

smallwrite 404.531 5640.602 458.907 1935.987 7.957 1.871

largewrite 363.251 14025.380 397.479 4361.725 17.927 1.652

cpuload 559.621 698.061 597.186 615.075 2.528 2.545

spawnload 719.436 880.521 782.215 783.246 3.219 3.250

8ctx-mem 583.971 8523.321 726.046 3272.639 13.451 2.827

2ctx-mem 653.283 7927.200 1012.727 2675.856 10.998 5.837

2.6.0-test11.out
Starting 1 CPU run with 91 MB RAM, minimum 5 data points at 20 sec intervals



_____________ delay ms. ____________ ___ Ratio ___

Test low high median average raw S.D.

noload 247.634 287.550 249.879 257.679 1.000 1.000

smallwrite 1316.429 5721.928 2389.766 3220.307 12.497 8.452

largewrite 2113.308 8078.336 4381.649 4257.728 16.523 13.199

cpuload 274.599 482.308 279.741 332.361 1.290 1.178

spawnload 272.836 402.664 279.448 312.193 1.212 1.104

8ctx-mem 5308.450 18760.567 11352.899 11576.995 44.928 45.659

2ctx-mem 8290.388 17804.397 16221.722 14460.405 56.118 63.957

2.6.0-test11-bk12.out
Starting 1 CPU run with 91 MB RAM, minimum 5 data points at 20 sec intervals



_____________ delay ms. ____________ ___ Ratio ___

Test low high median average raw S.D.

noload 246.138 295.824 250.364 261.131 1.000 1.000

smallwrite 283.588 7725.815 2287.066 3286.297 12.585 5.110

largewrite 1026.081 5734.609 3395.009 3436.063 13.158 13.721

cpuload 271.338 589.871 286.339 342.194 1.310 1.110

spawnload 271.747 388.543 287.548 304.779 1.167 1.124

8ctx-mem 4082.105 15076.516 11357.832 10469.028 40.091 51.557

2ctx-mem 10019.499 15834.343 12848.258 12858.138 49.240 50.796

2.6.0-test11-wli-2.out
Starting 1 CPU run with 91 MB RAM, minimum 5 data points at 20 sec intervals



_____________ delay ms. ____________ ___ Ratio ___

Test low high median average raw S.D.

noload 241.308 507.529 251.228 314.148 1.000 1.000

smallwrite 819.019 4922.349 1273.123 2052.233 6.533 5.389

largewrite 1781.760 9437.622 3838.493 4939.853 15.725 13.092

cpuload 269.542 562.006 277.177 352.739 1.123 1.213

spawnload 265.993 462.221 274.887 309.579 0.985 1.096

8ctx-mem 7499.271 24126.753 16028.938 15420.571 49.087 68.186

2ctx-mem 12701.278 39539.505 26427.740 25699.607 81.807 102.920

2.6.0-test11-mm1.out
Starting 1 CPU run with 91 MB RAM, minimum 5 data points at 20 sec intervals



_____________ delay ms. ____________ ___ Ratio ___

Test low high median average raw S.D.

noload 248.326 407.161 253.513 287.390 1.000 1.000

smallwrite 325.208 10172.099 3572.895 4920.817 17.122 14.015

largewrite 241.369 4797.435 2816.604 2229.296 7.757 10.942

cpuload 265.870 423.674 274.597 308.643 1.074 1.087

spawnload 265.519 471.692 278.004 315.864 1.099 1.076

8ctx-mem 4934.075 7982.318 6947.914 6705.179 23.331 27.765

2ctx-mem 4532.178 18488.828 6557.832 8481.727 29.513 23.228