Re: [BUG nohz]: wrong user and system time accounting

From: Mike Galbraith
Date: Thu Mar 30 2017 - 00:28:12 EST


On Wed, 2017-03-29 at 16:08 -0400, Rik van Riel wrote:

> In other words, the tick on cpu0 is aligned
> with the tick on the nohz_full cpus, and
> jiffies is advanced while the nohz_full cpus
> with an active tick happen to be in kernel
> mode?

You really want skew_tick=1, especially on big boxen.

> Frederic, can you think of any reason why
> the tick on nohz_full CPUs would end up aligned
> with the tick on cpu0, instead of running at some
> random offset?

(I or low rq->clock bits as crude NOHZ collision avoidance)

> A random offset, or better yet a somewhat randomized
> tick length to make sure that simultaneous ticks are
> fairly rare and the vtime sampling does not end up
> "in phase" with the jiffies incrementing, could make
> the accounting work right again.

That improves jitter, especially on big boxen. I have an 8 socket box
that thinks it's an extra large PC, there, collision avoidance matters
hugely. I couldn't reproduce bean counting woes, no idea if collision
avoidance will help that.

-Mike