Re: [PATCH] sched/cputime: make scale_stime() more precise

From: Oleg Nesterov
Date: Tue Jul 23 2019 - 10:00:48 EST

On 07/22, Peter Zijlstra wrote:
> On Fri, Jul 19, 2019 at 04:37:42PM +0200, Oleg Nesterov wrote:
> > On 07/19, Peter Zijlstra wrote:
> > > But I'm still confused, since in the long run, it should still end up
> > > with a proportionally divided user/system, irrespective of some short
> > > term wobblies.
> >
> > Why?
> >
> > Yes, statistically the numbers are proportionally divided.
> This; due to the loss in precision the distribution is like a step
> function around the actual s:u ratio line, but on average it still is
> s:u.

You know, I am no longer sure... perhaps it is even worse, I need to recheck.

> Even if it were a perfect function, we'd still see increments in stime even
> if the current program state never does syscalls, simply because it
> needs to stay on that s:u line.
> > but you will (probably) never see the real stime == 1000 && utime == 10000
> > numbers if you watch incrementally.
> See, there are no 'real' stime and utime numbers. What we have are user
> and system samples -- tick based.

Yes, yes, I know.

> Sure, we take a shortcut, it wobbles a bit, but seriously, the samples
> are inaccurate anyway, so who bloody cares :-)
> People always complain, just tell em to go pound sand :-)

I tried ;) this was my initial reaction to this bug report.


> You can construct a program that runs 99% in userspace but has all
> system samples.

Yes, but with the current implementation you do not need to construct
such a program, this is what you can easily get "in practice". And this
confuses people.

They can watch /proc/pid/stat incrementally and (when the numbers are big)
find that a program that runs 100% in userspace somehow spends 10 minutes
almost entirely in kernel. Or at least more in kernel than in userspace.
Even if task->stime doesn't grow at all.