Re: Where is the performance bottleneck?

From: Al Boldi
Date: Fri Sep 02 2005 - 09:33:07 EST


Holger Kiehl wrote:
> top - 08:39:11 up 2:03, 2 users, load average: 23.01, 21.48, 15.64
> Tasks: 102 total, 2 running, 100 sleeping, 0 stopped, 0 zombie
> Cpu(s): 0.0% us, 17.7% sy, 0.0% ni, 0.0% id, 78.9% wa, 0.2% hi, 3.1%
> si Mem: 8124184k total, 8093068k used, 31116k free, 7831348k
> buffers Swap: 15631160k total, 13352k used, 15617808k free, 5524k
> cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 3423 root 18 0 55204 460 392 R 12.0 0.0 1:15.55 dd
> 3421 root 18 0 55204 464 392 D 11.3 0.0 1:17.36 dd
> 3418 root 18 0 55204 464 392 D 10.3 0.0 1:10.92 dd
> 3416 root 18 0 55200 464 392 D 10.0 0.0 1:09.20 dd
> 3420 root 18 0 55204 464 392 D 10.0 0.0 1:10.49 dd
> 3422 root 18 0 55200 460 392 D 9.3 0.0 1:13.58 dd
> 3417 root 18 0 55204 460 392 D 7.6 0.0 1:13.11 dd
> 158 root 15 0 0 0 0 D 1.3 0.0 1:12.61 kswapd3
> 159 root 15 0 0 0 0 D 1.3 0.0 1:08.75 kswapd2
> 160 root 15 0 0 0 0 D 1.0 0.0 1:07.11 kswapd1
> 3419 root 18 0 51096 552 476 D 1.0 0.0 1:17.15 dd
> 161 root 15 0 0 0 0 D 0.7 0.0 0:54.46 kswapd0
>
> A loadaverage of 23 for 8 dd's seems a bit high. Also why is kswapd
> working so hard? Is that correct.

Actually, kswapd is another problem. (see "Kswapd Flaw" thread)
Which has little impact on your problem but basically kswapd tries very hard
maybe even to hard to fullfil a request for memory, so when the buffer/cache
pages are full kswapd tries to find some more unused memory. When it finds
none it starts recycling the buffer/cache pages. Which is OK, but it only
does this after searching for swappable memory which wastes CPU cycles.

This can be tuned a little but not much by adjusting /sys(proc)/.../vm/...
Or renicing kswapd to the lowest priority, which may cause other problems.

Things get really bad when procs start asking for more memory than is
available, causing kswapd to take the liberty of paging out running procs in
the hope that these procs won't come back later. So when they do come back
something like a wild goose chase begins. This is also known as OverCommit.

This is closely related to the dreaded OOM-killer, which occurs when the
system cannot satisfy a memory request for a returning proc, causing the VM
to start killing in an unpredictable manner.

Turning OverCommit off should solve this problem but it doesn't.

This is why it is recommended to run the system always with swap enabled even
if you have tons of memory, which really only pushes the problem out of the
way until you hit the dead end and the wild goose chase begins again.

Sadly 2.6.13 did not fix this either.

Although this description only vaguely defines the problem from an end-user
pov, the actual semantics may be quite different.

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/