Re: pluggable scheduler thread (was Re: Volanomark slows by 80% underCFS)

From: Chris Snook
Date: Mon Jul 30 2007 - 17:08:19 EST


Tim Chen wrote:
On Sat, 2007-07-28 at 02:51 -0400, Chris Snook wrote:

Tim --

Since you're already set up to do this benchmarking, would you mind varying the parameters a bit and collecting vmstat data? If you want to run oprofile too, that wouldn't hurt.


Here's the vmstat data. The number of runnable processes are fewer and there're more contex switches with CFS. The vmstat for 2.6.22 looks like

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
391 0 0 1722564 14416 95472 0 0 169 25 76 6520 3 3 89 5 0
400 0 0 1722372 14416 95496 0 0 0 0 264 641685 47 53 0 0 0
368 0 0 1721504 14424 95496 0 0 0 7 261 648493 46 51 3 0 0
438 0 0 1721504 14432 95496 0 0 0 2 264 690834 46 54 0 0 0
400 0 0 1721380 14432 95496 0 0 0 0 260 657157 46 53 1 0 0
393 0 0 1719892 14440 95496 0 0 0 6 265 671599 45 53 2 0 0
423 0 0 1719892 14440 95496 0 0 0 15 264 701626 44 56 0 0 0
375 0 0 1720240 14472 95504 0 0 0 72 265 671795 43 53 3 0 0
393 0 0 1720140 14480 95504 0 0 0 7 265 733561 45 55 0 0 0
355 0 0 1716052 14480 95504 0 0 0 0 260 670676 43 54 3 0 0
419 0 0 1718900 14480 95504 0 0 0 4 265 680690 43 55 2 0 0
396 0 0 1719148 14488 95504 0 0 0 3 261 712307 43 56 0 0 0
395 0 0 1719148 14488 95504 0 0 0 2 264 692781 44 54 1 0 0
387 0 0 1719148 14492 95504 0 0 0 41 268 709579 43 57 0 0 0
420 0 0 1719148 14500 95504 0 0 0 3 265 690862 44 54 2 0 0
429 0 0 1719396 14500 95504 0 0 0 0 260 704872 46 54 0 0 0
460 0 0 1719396 14500 95504 0 0 0 0 264 716272 46 54 0 0 0
419 0 0 1719396 14508 95504 0 0 0 3 261 685864 43 55 2 0 0
455 0 0 1719396 14508 95504 0 0 0 0 264 703718 44 56 0 0 0
395 0 0 1719372 14540 95512 0 0 0 64 265 692785 45 54 1 0 0
424 0 0 1719396 14548 95512 0 0 0 10 265 732866 45 55 0 0 0

While 2.6.23-rc1 look like
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
23 0 0 1705992 17020 95720 0 0 0 0 261 1010016 53 42 5 0 0
7 0 0 1706116 17020 95720 0 0 0 13 267 1060997 52 41 7 0 0
5 0 0 1706116 17020 95720 0 0 0 28 266 1313361 56 41 3 0 0
19 0 0 1706116 17028 95720 0 0 0 8 265 1273669 55 41 4 0 0
18 0 0 1706116 17032 95720 0 0 0 2 262 1403588 55 41 4 0 0
23 0 0 1706116 17032 95720 0 0 0 0 264 1272561 56 40 4 0 0
14 0 0 1706116 17032 95720 0 0 0 0 262 1046795 55 40 5 0 0
16 0 0 1706116 17032 95720 0 0 0 0 260 1361102 58 39 4 0 0
4 0 0 1706224 17120 95724 0 0 0 126 273 1488711 56 41 3 0 0
24 0 0 1706224 17128 95724 0 0 0 6 261 1408432 55 41 4 0 0
3 0 0 1706240 17128 95724 0 0 0 48 273 1299203 54 42 4 0 0
16 0 0 1706240 17132 95724 0 0 0 3 261 1356609 54 42 4 0 0
5 0 0 1706364 17132 95724 0 0 0 0 264 1293198 58 39 3 0 0
9 0 0 1706364 17132 95724 0 0 0 0 261 1555153 56 41 3 0 0
13 0 0 1706364 17132 95724 0 0 0 0 264 1160296 56 40 4 0 0
8 0 0 1706364 17132 95724 0 0 0 0 261 1388909 58 38 4 0 0
18 0 0 1706364 17132 95724 0 0 0 0 264 1236774 56 39 5 0 0
11 0 0 1706364 17136 95724 0 0 0 2 261 1360325 57 40 3 0 0
5 0 0 1706364 17136 95724 0 0 0 1 265 1201912 57 40 3 0 0
8 0 0 1706364 17136 95724 0 0 0 0 261 1104308 57 39 4 0 0
7 0 0 1705976 17232 95724 0 0 0 127 274 1205212 58 39 4 0 0

Tim


From a scheduler performance perspective, it looks like CFS is doing much better on this workload. It's spending a lot less time in %sys despite the higher context switches, and there are far fewer tasks waiting for CPU time. The real problem seems to be that volanomark is optimized for a particular scheduler behavior.

That's not to say that we can't improve volanomark performance under CFS, but simply that CFS isn't so fundamentally flawed that this is impossible.

When I initially agreed with zeroing out wait time in sched_yield, I didn't realize that it could be negative and that this would actually promote processes in some cases. I still think it's reasonable to zero out positive wait times. Can you test to see if that optimization does better than unconditionally zeroing them out?

-- Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/