...
On Tue, 2 Oct 2007, Andrew Morton wrote:This is unexpected. High load average is due to either a task chewing a
lot of CPU time or a task stuck in uninterruptible sleep.
Not necessarily.
We saw high loadaverages with the timer bogosity with "gettimeofday()" and "select()" not agreeing, so they would do things like
date = time(..)
select(.. , timeout = <time + 1> )
and when "date" wasn't taking the jiffies offset into account, and thus mixing these kinds of different time sources, the select ended up returning immediately because they effectively used different clocks, and suddenly we had some applications chewing up 30% CPU time, because they were in a loop that *tried* to sleep.
And I wonder if the same kind thing is effectively happening here: the code is written so that it *tries* to sleep, but the rounding of the clock basically means that it's trying to sleep using a different clock than the one we're using to wake things up with, so some percentage of the time it doesn't sleep at all!
I wonder if the whole "round_jiffies()" thing should be written so that it never rounds down, or at least never rounds down to before the current second!