Re: [PATCH 5/5] sched: Accumulate vtime on top of nsec clocksource

From: Wanpeng Li
Date: Sat Jul 15 2017 - 01:27:25 EST


2017-07-15 11:37 GMT+08:00 Levin, Alexander (Sasha Levin)
<alexander.levin@xxxxxxxxxxx>:
> On Thu, Jun 29, 2017 at 07:15:11PM +0200, Frederic Weisbecker wrote:
>>From: Wanpeng Li <kernellwp@xxxxxxxxx>
>>
>>Currently the cputime source used by vtime is jiffies. When we cross
>>a context boundary and jiffies have changed since the last snapshot, the
>>pending cputime is accounted to the switching out context.
>>
>>This system works ok if the ticks are not aligned across CPUs. If they
>>instead are aligned (ie: all fire at the same time) and the CPUs run in
>>userspace, the jiffies change is only observed on tick exit and therefore
>>the user cputime is accounted as system cputime. This is because the
>>CPU that maintains timekeeping fires its tick at the same time as the
>>others. It updates jiffies in the middle of the tick and the other CPUs
>>see that update on IRQ exit:
>>
>> CPU 0 (timekeeper) CPU 1
>> ------------------- -------------
>> jiffies = N
>> ... run in userspace for a jiffy
>> tick entry tick entry (sees jiffies = N)
>> set jiffies = N + 1
>> tick exit tick exit (sees jiffies = N + 1)
>> account 1 jiffy as stime
>>
>>Fix this with using a nanosec clock source instead of jiffies. The
>>cputime is then accumulated and flushed everytime the pending delta
>>reaches a jiffy in order to mitigate the accounting overhead.
>>
>>[fweisbec: changelog, rebase on struct vtime, field renames, add delta
>>on cputime readers, keep idle vtime as-is (low overhead accounting),
>>harmonize clock sources]
>>
>>Reported-by: Luiz Capitulino <lcapitulino@xxxxxxxxxx>
>>Suggested-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>>Not-Yet-Signed-off-by: Wanpeng Li <kernellwp@xxxxxxxxx>
>>Cc: Rik van Riel <riel@xxxxxxxxxx>
>>Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>>Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>>Cc: Wanpeng Li <kernellwp@xxxxxxxxx>
>>Cc: Ingo Molnar <mingo@xxxxxxxxxx>
>>Cc: Luiz Capitulino <lcapitulino@xxxxxxxxxx>
>>Signed-off-by: Frederic Weisbecker <fweisbec@xxxxxxxxx>
>
> Hi all,
>
> This patch seems to be causing this:

Yeah, there is a patch to fix it.
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=0e4097c3354e2f5a5ad8affd9dc7f7f7d00bb6b9

Regards,
Wanpeng Li

>
> BUG: using smp_processor_id() in preemptible [00000000] code: kworker/u9:0/6
> caller is debug_smp_processor_id+0x1c/0x20 lib/smp_processor_id.c:56
> CPU: 1 PID: 6 Comm: kworker/u9:0 Not tainted 4.12.0-next-20170714 #187
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014
> Workqueue: events_unbound call_usermodehelper_exec_work
> Call Trace:
> __dump_stack lib/dump_stack.c:16 [inline]
> dump_stack+0x11d/0x1ef lib/dump_stack.c:52
> check_preemption_disabled+0x1f4/0x200 lib/smp_processor_id.c:46
> debug_smp_processor_id+0x1c/0x20 lib/smp_processor_id.c:56
> vtime_delta.isra.6+0x11/0x60 kernel/sched/cputime.c:686
> task_cputime+0x3ca/0x790 kernel/sched/cputime.c:882
> thread_group_cputime+0x51a/0xaa0 kernel/sched/cputime.c:327
> thread_group_cputime_adjusted+0x73/0xf0 kernel/sched/cputime.c:676
> wait_task_zombie kernel/exit.c:1114 [inline]
> wait_consider_task+0x1c82/0x37f0 kernel/exit.c:1389
> do_wait_thread kernel/exit.c:1452 [inline]
> do_wait+0x457/0xb00 kernel/exit.c:1523
> kernel_wait4+0x1fd/0x380 kernel/exit.c:1665
> SYSC_wait4+0x145/0x160 kernel/exit.c:1677
> SyS_wait4+0x2c/0x40 kernel/exit.c:1673
> call_usermodehelper_exec_sync kernel/kmod.c:286 [inline]
> call_usermodehelper_exec_work+0x1fc/0x2c0 kernel/kmod.c:323
> process_one_work+0xae7/0x1a00 kernel/workqueue.c:2097
> worker_thread+0x221/0x1860 kernel/workqueue.c:2231
> kthread+0x35f/0x430 kernel/kthread.c:231
> ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:425
> capability: warning: `syz-executor5' uses 32-bit capabilities (legacy support in use)
> BUG: using smp_processor_id() in preemptible [00000000] code: syz-executor6/7013
> caller is debug_smp_processor_id+0x1c/0x20 lib/smp_processor_id.c:56
> CPU: 3 PID: 7013 Comm: syz-executor6 Not tainted 4.12.0-next-20170714 #187
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014
> Call Trace:
> __dump_stack lib/dump_stack.c:16 [inline]
> dump_stack+0x11d/0x1ef lib/dump_stack.c:52
> check_preemption_disabled+0x1f4/0x200 lib/smp_processor_id.c:46
> debug_smp_processor_id+0x1c/0x20 lib/smp_processor_id.c:56
> vtime_delta.isra.6+0x11/0x60 kernel/sched/cputime.c:686
> task_cputime+0x3ca/0x790 kernel/sched/cputime.c:882
> thread_group_cputime+0x51a/0xaa0 kernel/sched/cputime.c:327
> thread_group_cputime_adjusted+0x73/0xf0 kernel/sched/cputime.c:676
> wait_task_zombie kernel/exit.c:1114 [inline]
> wait_consider_task+0x1c82/0x37f0 kernel/exit.c:1389
> do_wait_thread kernel/exit.c:1452 [inline]
> do_wait+0x457/0xb00 kernel/exit.c:1523
> kernel_wait4+0x1fd/0x380 kernel/exit.c:1665
> SYSC_wait4+0x145/0x160 kernel/exit.c:1677
> SyS_wait4+0x2c/0x40 kernel/exit.c:1673
> do_syscall_64+0x267/0x740 arch/x86/entry/common.c:284
> entry_SYSCALL64_slow_path+0x25/0x25
> RIP: 0033:0x40bd8a
> RSP: 002b:00007ffdbdf67b08 EFLAGS: 00000246 ORIG_RAX: 000000000000003d
> RAX: ffffffffffffffda RBX: 0000000000b22914 RCX: 000000000040bd8a
> RDX: 0000000040000001 RSI: 00007ffdbdf67b4c RDI: ffffffffffffffff
> RBP: 0000000000002243 R08: 0000000000001b65 R09: 0000000000b22940
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007ffdbdf67b4c R14: 0000000000016ee4 R15: 0000000000000016
> BUG: using smp_processor_id() in preemptible [00000000] code: init/1
> caller is debug_smp_processor_id+0x1c/0x20 lib/smp_processor_id.c:56
> CPU: 3 PID: 1 Comm: init Not tainted 4.12.0-next-20170714 #187
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014
> Call Trace:
> __dump_stack lib/dump_stack.c:16 [inline]
> dump_stack+0x11d/0x1ef lib/dump_stack.c:52
> check_preemption_disabled+0x1f4/0x200 lib/smp_processor_id.c:46
> debug_smp_processor_id+0x1c/0x20 lib/smp_processor_id.c:56
> vtime_delta.isra.6+0x11/0x60 kernel/sched/cputime.c:686
> task_cputime+0x3ca/0x790 kernel/sched/cputime.c:882
> thread_group_cputime+0x51a/0xaa0 kernel/sched/cputime.c:327
> thread_group_cputime_adjusted+0x73/0xf0 kernel/sched/cputime.c:676
> wait_task_zombie kernel/exit.c:1114 [inline]
> wait_consider_task+0x1c82/0x37f0 kernel/exit.c:1389
> do_wait_thread kernel/exit.c:1452 [inline]
> do_wait+0x457/0xb00 kernel/exit.c:1523
> kernel_wait4+0x1fd/0x380 kernel/exit.c:1665
> SYSC_wait4+0x145/0x160 kernel/exit.c:1677
> SyS_wait4+0x2c/0x40 kernel/exit.c:1673
> do_syscall_64+0x267/0x740 arch/x86/entry/common.c:284
> entry_SYSCALL64_slow_path+0x25/0x25
> RIP: 0033:0x7f61952dca3e
> RSP: 002b:00007fff93bafea0 EFLAGS: 00000246 ORIG_RAX: 000000000000003d
> RAX: ffffffffffffffda RBX: 00007f6195c326a0 RCX: 00007f61952dca3e
> RDX: 0000000000000001 RSI: 00007fff93bafedc RDI: ffffffffffffffff
> RBP: 00007fff93bafedc R08: 00007fff93bb0870 R09: 0000000000000001
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004
> R13: 00007fff93bb0bd0 R14: 0000000000000000 R15: 0000000000000000
>
> --
>
> Thanks,
> Sasha