Overscheduling DOES happen with high web server load.

Phillip Ezolt (ezolt@perf.zko.dec.com)
Wed, 5 May 1999 14:54:40 -0400 (EDT)


Hi all,

In doing some performance work with SPECWeb96 on ALpha/Linux with apache,
it looks like "schedule" is the main bottleneck.

(Kernel v2.2.5, Apache 1.3.4, egcs-1.1.1, iprobe-4.1)

When running a SPECWeb96 strobe run on Alpha/linux, I found that when the
CPU is pegged, 18% of the time is spent in the scheduler.

Using Iprobe, I got the following function breakdown: (only functions >1%
are shown)

Begin End Sample Image Total
Address Address Name Count Pct Pct
------- ------- ---- ----- --- ---
0000000000000000-00000000000029FC /usr/bin/httpd 127463 18.5
00000001200419A0-000000012004339F ap_vformatter 15061 11.8 2.2
FFFFFC0000300000-00000000FFFFFFFF vmlinux 482385 70.1
FFFFFC00003103E0-FFFFFC000031045F entInt 7848 1.6 1.1
FFFFFC0000315E40-FFFFFC0000315F7F do_entInt 48487 10.1 7.0
FFFFFC0000327A40-FFFFFC0000327D7F schedule 124815 25.9 18.1
FFFFFC000033FAA0-FFFFFC000033FCDF kfree 7876 1.6 1.1
FFFFFC00003A9960-FFFFFC00003A9EBF ip_queue_xmit 8616 1.8 1.3
FFFFFC00003B9440-FFFFFC00003B983F tcp_v4_rcv 11131 2.3 1.6
FFFFFC0000441CA0-FFFFFC000044207F do_csum_partial 43112 8.9 6.3
_copy_from_user

I can't pin it down to the exact source line, but the cycles are spent in
close proximity of one another.

FFFFFC0000327A40 schedule vmlinux
FFFFFC0000327C1C 01DC 2160 ( 1.7) *
FFFFFC0000327C34 01F4 28515 ( 22.8) **********************
FFFFFC0000327C60 0220 1547 ( 1.2) *
FFFFFC0000327C64 0224 26432 ( 21.2) *********************
FFFFFC0000327C74 0234 36470 ( 29.2) *****************************
FFFFFC0000327C9C 025C 24858 ( 19.9) *******************

(For those interested, I have the disassembled code. )

Apache has a fairly even cycle distribution, but in the kernel, 'schedule'
really sticks out as the CPU burner.

I think that the linear search for next runnable process is where time is
being spent.

As an independent test, I ran vmstat while SPECWeb was running.

The leftmost column is the number of processes waiting to run. These number
are above the 3 or 4 that are normally quoted.

procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 21 0 208 5968 5240 165712 0 0 4001 303 10263 6519 31 66 4
26 27 1 208 6056 5240 165848 0 0 2984 96 5623 3440 29 60 11
0 15 0 208 5096 5288 166384 0 0 4543 260 10850 7346 32 66 3
0 17 0 208 6928 5248 164936 0 0 5741 309 13129 8052 32 65 3
37 19 1 208 5664 5248 166144 0 0 2502 142 6837 3896 33 63 5
0 14 0 208 5984 5240 165656 0 0 3894 376 12432 7276 32 65 3
0 19 1 208 4872 5272 166248 0 0 2247 124 7641 4514 32 64 4
0 17 0 208 5248 5264 166336 0 0 4229 288 8786 5144 31 67 2
56 16 1 208 6512 5248 165592 0 0 2159 205 8098 4641 32 62 6
94 18 1 208 5920 5248 165896 0 0 1745 191 5288 2952 32 60 7
71 14 1 208 5920 5256 165872 0 0 2063 160 6493 3729 30 62 8
0 25 1 208 5032 5256 166544 0 0 3008 112 5668 3612 31 60 9
62 22 1 208 5496 5256 165560 0 0 2512 109 5661 3392 28 62 11
43 22 1 208 4536 5272 166112 0 0 3003 202 7198 4813 30 63 7
0 26 1 208 4800 5288 166256 0 0 2407 93 5666 3563 29 60 11
32 17 1 208 5984 5296 165632 0 0 2046 329 7296 4305 31 62 6
23 7 1 208 6744 5248 164904 0 0 1739 284 9496 5923 33 65 2
14 18 1 208 5128 5272 166416 0 0 3755 322 9663 6203 32 65 3
0 22 1 208 4256 5304 167288 0 0 2593 156 5678 3219 31 60 9
44 20 1 208 3688 5264 167184 0 0 3010 149 7277 4398 31 62 7
29 24 1 208 5232 5264 166248 0 0 1954 104 5687 3496 31 61 9
26 23 1 208 5688 5256 165568 0 0 3029 169 7124 4473 30 60 10
0 18 1 208 5576 5256 165656 0 0 3395 270 8464 5702 30 63 7

It looks like the run queue is much longer than expected.

I imagine this problem is compounded by the number of times "schedule" is
called.

On a webserver that does not have all of the web pages in memory, an httpd
processes life is the following:

1. Wake up for a request from the network.
2. Figure out what web page to load.
3. Ask the disk for it.
4. Sleep (Schedule()) until the page is ready.

This means that schedule will be called alot. In addition a process will wake
and sleep in a time much shorter than its allotted time slice.

Each time we schedule, we have to walk through the entire run queue. This will
cause less requests to be serviced. This will cause more processes to be stuck
on the run queue, this will make the walk down the runqueue even longer...

Bottom line, under a heavy web load, the linux kernel seems to spend and
unnecessary amount of time scheduling processes.

Is it necessary to calculate the goodness of every process at every schedule?
Can't we make the goodnesses static? Monkeying with the scheduler is big
business, and I realize that this will not be a v2.2 issue, but what about
v2.3?

--Phil

Digital/Compaq: HPSD/Benchmark Performance Engineering
Phillip.Ezolt@compaq.com ezolt@perf.zko.dec.com

ps. <shameless plug> For those interested in more detail there will be a
WIP paper describing this work presented at Linux Expo. </shameless plug>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/