Re: Weird behaviour on /proc/stat

From: Sven Wegener
Date: Wed Aug 06 2008 - 17:52:21 EST


On Wed, 6 Aug 2008, Rafael C. de Almeida wrote:

> I've executed the following code on a intel core 2 quad (linux 2.6.21.5):
>
> for (( x=0; x < 1800; x = x+1 )); do
> head -n5 /proc/stat |
> awk '{ print $2+$3+$4+$5+$6+$7+$8+$9 }' |
> awk 'BEGIN { x=0 } { if (NR == 1) y=$0; else x=x+$1; } END {
> print y, x }' |
> awk '{ print $0, $1-$2 }' >> values
> sleep 1;
> done
>
> My expectation was that the values file would have only 0s on the second
> field. It didn't happen. Actually, it was always a value greater than 0.
> So I went to the kernel code. The utilization is summed up here:
>
> http://lxr.linux.no/linux+v2.6.21.5/fs/proc/proc_misc.c#L463
>
> Reading that file, if anything the sum of all the cpuX fields should be
> greater than the cpu line. After all, it happens later and, if
> information regarding the utilization is updated during the generation
> of the output, then the cpuX lines should have a greater value.
>
> I also noted that on
> http://lxr.linux.no/linux+v2.6.21.5/fs/proc/proc_misc.c#L463
> for_each_possible_cpu is used. While on
> http://lxr.linux.no/linux+v2.6.21.5/fs/proc/proc_misc.c#L487
> for_each_online_cpu is used. All the cores on the system are online, so
> where could be the extra utilization that's being added to the first
> line result?
>
> I wish I had a machine with 4 cores which I could test changes on that
> code, so I could investigate things a little further. But the only
> machine I can change the kernel is my home computer which has only one
> core :(.

It's expected behaviour, but it is indeed misleading. Here's the reason
why it happens: In the kernel we're accounting time based on CONFIG_HZ
(which I suspect is 1000 in your case) but are exporting values based on
USER_HZ (100, historic reasons) to userspace. So we're effectively
dividing the values by 10. Well, that division obviously leaves a
remainder in most cases, which is dropped. You see in the code that for
the summary we first add all in-kernel values up and then do the
conversion (cputime64_to_clock_t) to userspace values. So we're actually
adding up all the remainders, which we drop when converting each per-cpu
data on its own. This leads to a couple of additional jiffies being
accounted in the summary.

Sven
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/