Re: High load average on disk I/O on 2.6.17-rc3
From: Russell King
Date: Mon May 08 2006 - 12:46:52 EST
On Mon, May 08, 2006 at 05:42:18PM +0200, Erik Mouw wrote:
> On Mon, May 08, 2006 at 05:31:29PM +0200, Arjan van de Ven wrote:
> > On Mon, 2006-05-08 at 17:22 +0200, Erik Mouw wrote:
> > > ... except that any kernel < 2.6 didn't account tasks waiting for disk
> > > IO.
> >
> > they did. It was "D" state, which counted into load average.
>
> They did not or at least to a much lesser extent. That's the reason why
> ZenIV.linux.org.uk had a mail DoS during the last FC release and why we
> see load average questions on lkml.
>
> I've seen it on our servers as well: when using 2.4 and doing 50 MB/s
> to disk (through NFS), the load just was slightly above 0. When we
> switched the servers to 2.6 it went to ~16 for the same disk usage.
It's actually rather interesting to look at 2.6 and load averages.
The load average appears to depend on the type of load, rather than
the real load. Let's look at three different cases:
1. while (1) { } loop in a C program.
Starting off with a load average of 0.00 with a "watch uptime" running
(and leaving that running for several minutes), then starting such a
program, and letting it run for one minute.
Result: load average at the end: 0.60
2. a program which runs continuously for 6 seconds and then sleeps for
54 seconds.
Result: load average peaks at 0.12, drops to 0.05 just before it
runs for a second 6 second.
ps initially reports 100% CPU usage, drops to 10% after one minute,
rises to 18% and drops back towards 10%, and gradually settles on
10%.
3. a program which runs for 1 second and then sleeps for 9 seconds.
Result: load average peaks at 0.22, drops to 0.15 just before it
runs for the next second.
ps reports 10% CPU usage.
Okay, so, two different CPU work loads without any other IO, using the
same total amount of CPU time every minute seem to produce two very
different load averages. In addition, using 100% CPU for one minute
does not produce a load average of 1.
Seems to me that somethings wrong somewhere. Either that or the first
load average number no longer represents the load over the past one
minute.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/