Re: [PATCH] sched/cputime: make scale_stime() more precise

From: Peter Zijlstra
Date: Mon Jul 22 2019 - 15:56:15 EST

On Fri, Jul 19, 2019 at 04:37:42PM +0200, Oleg Nesterov wrote:
> On 07/19, Peter Zijlstra wrote:

> > But I'm still confused, since in the long run, it should still end up
> > with a proportionally divided user/system, irrespective of some short
> > term wobblies.
> Why?
> Yes, statistically the numbers are proportionally divided.

This; due to the loss in precision the distribution is like a step
function around the actual s:u ratio line, but on average it still is

Even if it were a perfect function, we'd still see increments in stime even
if the current program state never does syscalls, simply because it
needs to stay on that s:u line.

> but you will (probably) never see the real stime == 1000 && utime == 10000
> numbers if you watch incrementally.

See, there are no 'real' stime and utime numbers. What we have are user
and system samples -- tick based.

If the tick lands in the kernel, we get a system sample, if the tick
lands in userspace we get a user sample.

What we do have is an accurate (ns) based runtime accounting, and we
(re)construct stime and utime from this; we divide the total known
runtime in stime and utime pro-rata.

Sure, we take a shortcut, it wobbles a bit, but seriously, the samples
are inaccurate anyway, so who bloody cares :-)

You can construct a program that runs 99% in userspace but has all
system samples. All you need to do is make sure you're in a system call
when the tick lands.

> Just in case... yes I know that these numbers can only "converge" to the
> reality, only their sum is correct. But people complain.

People always complain, just tell em to go pound sand :-)