Re: [BUG nohz]: wrong user and system time accounting
From: Frederic Weisbecker
Date: Wed Mar 29 2017 - 18:54:37 EST
(Adding Thomas in Cc)
On Wed, Mar 29, 2017 at 04:08:45PM -0400, Rik van Riel wrote:
> On Wed, 2017-03-29 at 13:16 -0400, Luiz Capitulino wrote:
> > On Tue, 28 Mar 2017 13:24:06 -0400
> > Luiz Capitulino <lcapitulino@xxxxxxxxxx> wrote:
> >
> > > 1. In my tracing I'm seeing that sometimes (always?) the
> > > time interval between two timer interrupts is less than 1ms
> >
> > I think that's the root cause.
> >
> > In this trace, we see the following:
> >
> > 1. On CPU15, we transition from user-space to kernel-space because
> > of a timer interrupt (it's the tick)
> >
> > 2. vtimer_delta() returns 0, because jiffies didn't change since the
> > last accounting
> >
> > 3. While CPU15 is executing in kernel-space, jiffies is updated
> > by CPU0
> >
> > 4. When going back to user-space, vtime_delta() returns non-zero
> > and the whole time is accounted for system time (observe how
> > the cputime parameter in account_system_time() is less than 1ms)
>
> In other words, the tick on cpu0 is aligned
> with the tick on the nohz_full cpus, and
> jiffies is advanced while the nohz_full cpus
> with an active tick happen to be in kernel
> mode?
Ah you found out faster than me :-)
> Frederic, can you think of any reason why
> the tick on nohz_full CPUs would end up aligned
> with the tick on cpu0, instead of running at some
> random offset?
tick_init_jiffy_update() takes that decision to align all ticks.
I'm not sure why. I don't see anything that could depend on that
wide tick synchronization. The jiffies update itself relies on ktime
to check when to update it. So even if the tick fires a bit later
on CPU 1 than on CPU 0, the jiffies updates should stay coherent and
should never exceed 999us delay in the worst case (for HZ=1000)
Now I might overlook something.
>
> A random offset, or better yet a somewhat randomized
> tick length to make sure that simultaneous ticks are
> fairly rare and the vtime sampling does not end up
> "in phase" with the jiffies incrementing, could make
> the accounting work right again.
>
> Of course, that assumes the above hypothesis is correct :)
I'm not sure that randomizing the tick start per CPU would be a
right solution. Somewhere in the world you can be sure the tick
randomization of some nohz_full CPU will coincide with the tick
of CPU 0 :o)
Or we could force that tick on nohz_full CPUs to be far from
CPU 0's tick... I'm not sure such a solution would be accepted though.