Re: [PATCH 1/2] nohz: use seqlock to avoid race on idle time stats v2

From: Denys Vlasenko
Date: Wed Apr 09 2014 - 08:50:37 EST


On Mon, Apr 7, 2014 at 8:17 PM, Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote:
> The following example displays all the nonsense of that stat:
>
> CPU 0 CPU 1
>
> task A block on IO ...
> task B runs for 1 min ...
> task A completes IO
>
> So in the above we've been waiting on IO for 1 minute. But none of that
> have been accounted.

If there is task B which can put CPU to use while task A
waits for IO, then *system performance is not IO bound*.


> OTOH if task B were to run on CPU 1 (it could have,
> really here this is about scheduler load balancing internals, hence pure
> randomness for the user), the iowait time would have been accounted.

Case A: overall system stats are: 50% busy, 50% idle
Case B: overall system stats are: 50% busy, 50% iowait

You are right, this does not look correct.

Lets step back and look at the situation from a high-level POV.
I believe I have a solution for this problem.

Let's say we have a heavily loaded file server machine where CPUs
are busy only 5% of the time. It makes sense to say that machine
as a whole is "95% waiting for IO".

Our existing accounting did exactly that for single-CPU machines.

But for, say, 2-CPU machine it can show 5% busy, 45% iowait, 50% idle
if there is only one task reading files, or 5% busy, 95% iowait
if there are more than one task.

But it's wrong! NONE of the CPUs are "idle" as long as there even
one task blocked on IO. The machine is still IO-bound, not idling.
In my example, it should not matter whether the machine has one
or 64 CPUs, it should show 5% busy, 95% iowait overall state
in both cases.

Does the above make sense to you?

My proposal is to count each CPU's time towards iowait
if there are task(s) blocked on IO, *regardless on which
runqueue they are*. Only if there are none, then time
is counted towards idle.


> I doubt that users are interested in such random accounting. They want
> to know either:
>
> 1) how much time was spent waiting on IO by the whole system

Hmm, I think I just said the same thing :)

> 2) how much time was spent waiting on IO per task
> 3) how much time was spent waiting on IO per CPU that initiated
> IOs, or per CPU which ran task completing IOs. In order to have
> an overview on where these mostly happened.

Some people may want to know these things, and I am not objecting
to adding whatever counters to help with that.

--
vda
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/