Re: High load average on disk I/O on 2.6.17-rc3

From: David Lang
Date: Tue May 09 2006 - 01:04:10 EST

Next message: Protasevich, Natalie: "RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision"
Previous message: Nick Piggin: "Re: High load average on disk I/O on 2.6.17-rc3"
In reply to: Hua Zhong: "RE: High load average on disk I/O on 2.6.17-rc3"
Next in thread: Sander: "Re: High load average on disk I/O on 2.6.17-rc3"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, 9 May 2006, Arjan van de Ven wrote:

On Tue, 2006-05-09 at 11:57 +1000, Nick Piggin wrote:
Arjan van de Ven wrote:

... except that any kernel < 2.6 didn't account tasks waiting for disk
IO.

they did. It was "D" state, which counted into load average.

Perhaps kernel threads in D state should not contribute toward load avg

that would be a change from, well... a LONG time

The question is what "load" means; if you want to change that... then
there are even better metrics possible. Like
"number of processes wanting to run + number of busy spindles + number
of busy nics + number of VM zones that are below the problem
watermark" (where "busy" means "queue full")

or 50 million other definitions. If we're going to change the meaning,
we might as well give it a "real" meaning.

(And even then it is NOT a good measure for determining if the machine
can perform more work, the graph I put in a previous mail is very real,
and in practice it seems the saturation line is easily 4x or 5x of the
"linear" point)

while this is true, it's also true that up in this area it's very easy for a spike of activity to cascade through the box and bring everything down to it's knees (I've seen a production box go from 'acceptable' response time to being effectivly down for two hours with a small 'tar' command (10's of K of writes) being the trigger that pushed it over the edge.

in general loadave > 2x #procs has been a good indication that the box is in danger and needs careful watching. I don't know when Linux changed it's loadavg calculation, but within the last several years there was a change that caused the loadaveg to report higher for the same amount of activity on the box. as a user it's hard to argue which is the more 'correct' value.

of the various functions that you mentioned above.

# processes wanting to run.
gives a good indication if the cpu is the bottleneck. this is what people think loadavg means (the textbooks may be wrong, but they're what people learn from)

# spindles busy
gives a good indication if the disks are the bottleneck. this needs to cover seek time and read/write time. My initial reaction is to base this on the avg # of outstanding requests to the drive, but I'm not sure how this would interact with TCQ/NCQ (it may just be that people need to know their drives, and know that a higher value for those drives is acceptable). This is one that I don't know how to find today (wait time won't show if something else keeps the cpu busy). In many ways this stat should be per-drive as well as any summary value (you can't just start useing another spindle the way you can just use another cpu, even in a NUMA system :-)

# Nic's busy
don't bother with this, the networking folks have been tracking this for years, either locally on the box, or through the networking infrastructure (mrtg and friends were built for this)

# vm zones below the danger point
I'm not sure about this one either in practice watching for pageing rates to climb seems to work, but this area is where black magic monitoring is in full force (and at the rate of change on the VM doesn't help this understanding)

I can understand your reluctance to quickly tinker with the loadavg calculation, but would it be possible to make the other values available by themselves for a while. then people can experiment in userspace to find the best way to combine the values into a single, nicely graphable 'health of the box' value.

David Lang

P.S. I would love to be told that I'm just ignorant of how to monitor these things independantly. it would make my life much easier to learn how.

--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Protasevich, Natalie: "RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision"
Previous message: Nick Piggin: "Re: High load average on disk I/O on 2.6.17-rc3"
In reply to: Hua Zhong: "RE: High load average on disk I/O on 2.6.17-rc3"
Next in thread: Sander: "Re: High load average on disk I/O on 2.6.17-rc3"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]