Re: [BUG nohz]: wrong user and system time accounting

From: Luiz Capitulino
Date: Mon Apr 03 2017 - 15:06:32 EST


On Mon, 3 Apr 2017 17:23:17 +0200
Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote:

> Do you observe aligned ticks with trace events (hrtimer_expire_entry)?
>
> You might want to enforce the global clock to trace that:
>
> echo "global" > /sys/kernel/debug/tracing/trace_clock

I've used the same trace points & debugging code I've been using to debug
this issue, and this what I'm seeing:

stress-25757 [002] 2742.717507: function: enter_from_user_mode <-- apic_timer_interrupt
stress-25757 [002] 2742.717508: function: __context_tracking_exit <-- enter_from_user_mode
stress-25757 [002] 2742.717508: bprint: vtime_delta: diff=0 (now=4297409970 vtime_snap=4297409970)
stress-25757 [002] 2742.717509: function: smp_apic_timer_interrupt <-- apic_timer_interrupt
stress-25757 [002] 2742.717509: function: irq_enter <-- smp_apic_timer_interrupt
stress-25757 [002] 2742.717510: hrtimer_expire_entry: hrtimer=0xffffc900039fbe58 function=hrtimer_wakeup now=2742674000776
stress-25757 [002] 2742.717514: function: irq_exit <-- smp_apic_timer_interrupt
cyclictest-25760 [002] 2742.717518: function: vtime_account_system <-- vtime_common_task_switch
cyclictest-25760 [002] 2742.717518: bprint: vtime_delta: diff=1000000 (now=4297409971 vtime_snap=4297409970)
cyclictest-25760 [002] 2742.717519: function: __vtime_account_system <-- vtime_account_system
cyclictest-25760 [002] 2742.717519: bprint: get_vtime_delta: vtime_snap=4297409970 now=4297409971
cyclictest-25760 [002] 2742.717520: function: account_system_time <-- __vtime_account_system
cyclictest-25760 [002] 2742.717520: bprint: account_system_time: cputime=961981
cyclictest-25760 [002] 2742.717521: function: __context_tracking_enter <-- do_syscall_64
cyclictest-25760 [002] 2742.717522: function: vtime_user_enter <-- __context_tracking_enter
cyclictest-25760 [002] 2742.717522: bprint: vtime_delta: diff=0 (now=4297409971 vtime_snap=4297409971)

CPU2 shows 98% system time while the other CPUs (from CPU3 to CPU7)
show 98% user time (they're all running the same workload).

What's happening here is:

1. Timer interrupt
2. Transition from user-space to kernel-space, vtimer_delta()
returns zero
3. Context switch from hog application to cyclictest
4. This time vtime_delta() returns != zero, which implies
jiffies was updated between steps 2 and 3

This seems to be the pattern that accounts incorrectly,
and seem to suggest that the ticks are aligned because
this repeats over and over.

Please, let me know if you want me to run a different
trace-cmd command-line.