Re: Performance differences in recent kernels

From: rwhron@earthlink.net
Date: Wed Sep 11 2002 - 22:11:56 EST


>> AIM7 database workload
>> kernel Tasks Jobs/Min Real CPU
>> 2.5.33-mm5 256 763.0 1992.9 1020.8

> I assume that's seconds of CPU for the entire run?

Yes.

>> IRMAN - interactive response measurement.
>>
>> FILE_IO Response time measurements (milliseconds)
>> Max Min Avg StdDev
>> 2.4.20-pre4-ac1 40.603 0.008 0.009 0.043
>> 2.4.20-pre5 52.405 0.009 0.011 0.080
>> 2.5.33-mm5 2.955 0.008 0.010 0.004

> For many things, 1/latency == throughput. But the averages are
> the same here. Be interesting to run it on the 384 megabyte machine.

This is on the 384 mb machine. It doesn't have the really low
max response time on 2.5.33-mm5.

                      FILE_IO Response time measurements (milliseconds)
                              Max Min Avg StdDev
2.4.19-rmap14 190.061 0.016 0.137 4.017
2.4.20-pre5 180.719 0.016 0.113 3.708
2.4.20-pre5-ac1 552.150 0.011 0.035 3.160
2.4.20-pre5aa1 450.191 0.012 0.030 2.772
2.5.33-mm1 456.736 0.012 0.033 2.979
2.5.33-mm5 456.733 0.016 0.047 3.633
2.5.33 456.061 0.012 0.034 3.077

> It would be interesting to run irman in conjunction with tiobench
> or dbench. One the same disk and on a different disk.

hm, i'll think about ways to do that and keep the test repeatable.

>> Time to build the kernel 12 times. Not a lot of difference here.
>>
>> kernel seconds
>> 2.5.33 728.2
>> 2.5.33-mm5 736.8

> Should have been better than that. Although the kmap work doesn't
> seem to affect these machines. Is that a `make -j1', `-j6'???

That's make -j1 executed simultaneously on 4 different kernel
trees.

> It is useful to watch the amount of CPU which is consumed with dbench,
> btw. That's one thing which tends to be fairly repeatable between runs.

Like "time dbench 64", or something else?

>> Sequential Writes ext2
>> There is a dramatic reduction in cpu utilization in 2.5.33-mm5 and increase in
>> throughput compared to 2.5.33 when thread count is high.
>>
>> Num Avg Maximum Lat% CPU
>> Kernel Thr Rate (CPU%) Latency Latency >10s Eff
>> ------------------ --- ---------------------------------------------------
>> 2.4.19-rc5-aa1 128 37.40 45.99% 32.405 46333.30 0.00105 81
>> 2.4.20-pre4-ac1 128 34.01 36.94% 40.121 47331.57 0.00058 92
>> 2.4.20-pre5 128 32.98 49.33% 39.692 52093.19 0.01446 67
>> 2.5.33 128 12.17 222.9% 108.966 910455.61 0.19503 5
>> 2.5.33-mm5 128 30.78 30.03% 32.973 909931.81 0.07858 102

> This test is highly dependent upon the size of the request queues. The
> queues have 128 slots and you're running 128 threads. One would expect
> to see a lot of variability with that combo. Would be interesting to also
> test 64 threads, and 256.

> -aa has the "merge even after latency has expired" logic in the elevator,
> which could make some difference at the 128 thread level.

On 2.5.33-mm1, the throughput and cpu is pretty comparable at 64, 128 and
256 threads. cpu utilization on -aa is very different at 64 and 256
threads compared to 128.

Sequential Writes ext2
                    Num Avg Maximum Lat% CPU
Kernel Thr Rate (CPU%) Latency Latency >10s Eff
------------------- --- -----------------------------------------------------
2.4.19-rc1-aa1 64 35.62 92.35% 19.063 23691.67 0.00000 39
2.4.20-pre4-ac1 64 34.17 37.34% 20.282 23123.27 0.00000 92
2.4.20-pre5 64 33.19 48.62% 20.235 28646.79 0.00003 68
2.5.33 64 14.06 191.6% 47.371 275718.71 0.09892 7
2.5.33-mm5 64 30.69 29.55% 19.323 277839.26 0.07670 104

2.4.19-rc1-aa1 256 35.29 91.34% 72.996 83127.21 0.16864 39
2.4.20-pre4-ac1 256 34.88 38.32% 76.128 91906.08 0.48809 91
2.4.20-pre5 256 33.20 49.30% 76.394 95761.85 0.50046 67
2.5.33 256 12.65 252.6% 197.722 1334605.09 0.31655 5
2.5.33-mm5 256 28.37 29.96% 67.484 699119.03 0.10001 95

>> Sequential Reads ext3
>> 2.5.33-mm5 has a more graceful degradation in throughput on ext3.
>> Fairness is better too.
>>
>> Num Avg Maximum Lat% CPU
>> Kernel Thr Rate (CPU%) Latency Latency >10s Eff
>> ------------------ --- ---------------------------------------------------
>> 2.4.19-rc5-aa1 1 51.13 29.59% 0.227 460.92 0.00000 173
>> 2.4.20-pre4-ac1 1 34.12 17.37% 0.341 1019.65 0.00000 196
>> 2.4.20-pre5 1 33.28 20.62% 0.350 137.44 0.00000 161
>> 2.5.33-mm5 1 31.70 14.75% 0.367 581.89 0.00000 215

> 20% drop in CPU load is typical, but the reduced bandwidth on such
> a simple test is unexpected. Presumably that's the driver thing.

>> 2.4.19-rc5-aa1 64 7.38 4.51% 98.947 20638.56 0.00000 164
>> 2.4.20-pre4-ac1 64 6.55 3.94% 110.432 14937.49 0.00000 166
>> 2.4.20-pre5 64 6.34 4.16% 111.299 14234.83 0.00000 152
>> 2.5.33-mm5 64 12.29 8.51% 55.372 8799.99 0.00000 144

> hm. Don't know. Something went right in the 2.5 elevator I guess.

It went right at 32 threads too. 128 and 256 are better than average
on 2.5.33-mm5.

Sequential Reads ext3
                  Num Avg Maximum Lat% CPU
Kernel Thr Rate (CPU%) Latency Latency >10s Eff
----------------- --- -----------------------------------------------------
2.4.19-rc1-aa1 32 7.19 5.13% 51.480 14981.42 0.00000 140
2.4.20-pre4-ac1 32 6.70 3.96% 54.502 6613.54 0.00000 169
2.4.20-pre5 32 7.25 4.72% 49.839 7344.23 0.00000 153
2.5.33-mm5 32 15.26 9.89% 23.999 10942.51 0.00000 154

2.4.19-rc1-aa1 128 6.70 4.86% 220.082 56700.25 0.91508 138
2.4.20-pre4-ac1 128 6.17 3.71% 227.375 26637.67 0.00000 166
2.4.20-pre5 128 6.41 4.21% 212.999 32363.01 0.00009 152
2.5.33-mm5 128 9.63 7.89% 128.288 16677.89 0.00000 122

2.4.19-rc1-aa1 256 6.51 4.70% 450.763 109737.74 2.23919 139
2.4.20-pre4-ac1 256 5.69 3.41% 475.816 54311.24 0.00703 167
2.4.20-pre5 256 5.74 3.90% 470.792 55091.26 0.05414 147
2.5.33-mm5 256 7.39 6.92% 305.286 46425.82 0.08529 107

-- 
Randy Hron
http://home.earthlink.net/~rwhron/kernel/bigbox.html

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Sep 15 2002 - 22:00:27 EST