Re: Utime and stime are less when getrusage (RUSAGE_THREAD) is executed on a tickless CPU.

From: Mel Gorman
Date: Fri May 21 2021 - 04:41:57 EST


On Fri, May 21, 2021 at 06:40:53AM +0000, hasegawa-hitomi@xxxxxxxxxxx wrote:
> Hi Peter and Frederic
>
>
> > > Would be superfluous for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y
> > > architectures at the very least.
> > >
> > > It also doesn't help any of the other callers, like for example procfs.
> > >
> > > Something like the below ought to work and fix all variants I think. But
> > > it does make the call significantly more expensive.
> > >
> > > Looking at thread_group_cputime() that already does something like this,
> > > but that's also susceptible to a variant of this very same issue; since
> > > it doesn't call it unconditionally, nor on all tasks, so if current
> > > isn't part of the threadgroup and/or another task is on a nohz_full cpu,
> > > things will go wobbly again.
> > >
> > > There's a note about syscall performance there, so clearly someone seems
> > > to care about that aspect of things, but it does suck for nohz_full.
> > >
> > > Frederic, didn't we have remote ticks that should help with this stuff?
> > >
> > > And mostly I think the trade-off here is that if you run on nohz_full,
> > > you're not expected to go do syscalls anyway (because they're sodding
> > > expensive) and hence the accuracy of these sort of things is mostly
> > > irrelevant.
> > >
> > > So it might be the use-case is just fundamentally bonkers and we
> > > shouldn't really bother fixing this.
> > >
> > > Anyway?
> >
> > Typing be hard... that should 'obviously' be reading: Anyone?
>
>
> I understand that there is a trade-off between performance and accuracy
> and that this issue may have already been discussed.
> However, as Peter mentions, the process of updating sum_exec_runtime
> just before retrieving information is already implemented in
> thread_group_cputime() in the root of RUSAGE_SELF etc. So, I think
> RUSAGE_THREAD should follow suit and implement the same process.
>

I don't think it's a straight-forward issue. I know we've had to deal with
bugs in the past where the overhead of getting CPU usage statistics was
high enough to dominate workloads that had self-monitoring capabilities to
the extent the self-monitoring was counter-productive. It was particularly
problematic when self-monitoring was being activated to find the source
of a slowdown. I tend to agree with Peter here that the fix may be worse
than the problem ultimately where workloads are not necessarily willing
to pay the cost of accuracy and as he pointed out already, it's expected
nohz_full tasks are avoiding syscalls as much as possible.

--
Mel Gorman
SUSE Labs