Re: [lkp-robot] [sched/cputime] 2a42eb9594: BUG:using_smp_processor_id()in_preemptible

From: Wanpeng Li
Date: Thu Jul 06 2017 - 22:59:21 EST


2017-07-07 10:09 GMT+08:00 kernel test robot <xiaolong.ye@xxxxxxxxx>:
>
> FYI, we noticed the following commit:
>
> commit: 2a42eb9594a1480b4ead9e036e06ee1290e5fa6d ("sched/cputime: Accumulate vtime on top of nsec clocksource")
> https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git sched/urgent
>
> in testcase: boot
>
> on test machine: qemu-system-x86_64 -enable-kvm -cpu Haswell,+smep,+smap -smp 2 -m 512M
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
> +--------------------------------------------+------------+------------+
> | | bac5b6b6b1 | 2a42eb9594 |
> +--------------------------------------------+------------+------------+
> | boot_successes | 8 | 2 |
> | boot_failures | 0 | 6 |
> | BUG:kernel_hang_in_test_stage | 0 | 2 |
> | BUG:using_smp_processor_id()in_preemptible | 0 | 4 |
> +--------------------------------------------+------------+------------+
>
>
>
> [ 55.289944] BUG: using smp_processor_id() in preemptible [00000000] code: 99-trinity/181
> [ 55.302094] caller is debug_smp_processor_id+0x17/0x19
> [ 55.313831] CPU: 0 PID: 181 Comm: 99-trinity Not tainted 4.12.0-01059-g2a42eb9 #1
> [ 55.324812] Call Trace:
> [ 55.328601] dump_stack+0x82/0xb8
> [ 55.333467] check_preemption_disabled+0xd1/0xe3
> [ 55.340370] debug_smp_processor_id+0x17/0x19
> [ 55.346815] vtime_delta+0xd/0x2c
> [ 55.352843] task_cputime+0x89/0xdb
> [ 55.358026] thread_group_cputime+0x11b/0x1ed
> [ 55.364517] thread_group_cputime_adjusted+0x1f/0x47
> [ 55.371895] wait_consider_task+0x2a9/0xaf9
> [ 55.378025] ? lock_acquire+0x97/0xa4
> [ 55.383463] do_wait+0xdf/0x1f4
> [ 55.388199] SYSC_wait4+0x8e/0xb5
> [ 55.393219] ? list_add+0x34/0x34
> [ 55.398317] SyS_wait4+0x9/0xb
> [ 55.403401] do_syscall_64+0x70/0x82
> [ 55.408977] entry_SYSCALL64_slow_path+0x25/0x25
> [ 55.409006] RIP: 0033:0x7ff2f15c3c3e
> [ 55.409032] RSP: 002b:00007ffd3e91eab0 EFLAGS: 00000246 ORIG_RAX: 000000000000003d
> [ 55.409064] RAX: ffffffffffffffda RBX: 00007ff2f1f0b6a0 RCX: 00007ff2f15c3c3e
> [ 55.409070] RDX: 0000000000000001 RSI: 00007ffd3e91eb18 RDI: ffffffffffffffff
> [ 55.409075] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
> [ 55.409101] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> [ 55.409106] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
> [ 55.550819] random: trinity: uninitialized urandom read (4 bytes read)
> mountall: Event failed
> [ 55.947396] random: mountall: uninitialized urandom read (12 bytes read)
> [ 56.044974] BUG: using smp_processor_id() in preemptible [00000000] code: mountall/194
> [ 56.057287] caller is debug_smp_processor_id+0x17/0x19
> [ 56.066939] CPU: 1 PID: 194 Comm: mountall Not tainted 4.12.0-01059-g2a42eb9 #1
> [ 56.078248] Call Trace:
> [ 56.082297] dump_stack+0x82/0xb8
> [ 56.087256] check_preemption_disabled+0xd1/0xe3
> [ 56.093512] debug_smp_processor_id+0x17/0x19
> [ 56.099800] vtime_delta+0xd/0x2c
> [ 56.105862] task_cputime+0x89/0xdb
> [ 56.111237] thread_group_cputime+0x11b/0x1ed
> [ 56.117693] thread_group_cputime_adjusted+0x1f/0x47
> [ 56.125298] wait_consider_task+0x2a9/0xaf9
> [ 56.131983] ? do_raw_read_lock+0x39/0x3c
> [ 56.138293] do_wait+0xdf/0x1f4
> [ 56.143000] SyS_waitid+0xac/0x1e3
> [ 56.148566] ? list_add+0x34/0x34
> [ 56.153694] do_syscall_64+0x70/0x82
> [ 56.159267] entry_SYSCALL64_slow_path+0x25/0x25
> [ 56.166324] RIP: 0033:0x7fdc9c642d2f
> [ 56.171770] RSP: 002b:00007ffe6f4dda30 EFLAGS: 00000246 ORIG_RAX: 00000000000000f7
> [ 56.181888] RAX: ffffffffffffffda RBX: 0000560f25433d70 RCX: 00007fdc9c642d2f
> [ 56.192353] RDX: 00007ffe6f4dda78 RSI: 00000000000000c7 RDI: 0000000000000001
> [ 56.203076] RBP: 0000560f2543bf20 R08: 0000000000000000 R09: 0000000000000020
> [ 56.212903] R10: 0000000000000004 R11: 0000000000000246 R12: 0000560f25439be0
> [ 56.223108] R13: 00000000000000c7 R14: 00000000000000c7 R15: 0000560f25222c1f
> [ 56.522906] init: Failed to create pty - disabling logging for job
> [ 56.535738] init: Temporary process spawn error: No such file or directory
> [ 56.769705] init: Failed to create pty - disabling logging for job
> [ 56.781761] init: Temporary process spawn error: No such file or directory
> [ 57.258638] init: Failed to create pty - disabling logging for job
> [ 57.271867] init: Temporary process spawn error: No such file or directory
> [ 57.318928] init: Failed to create pty - disabling logging for job
> [ 57.340639] init: Temporary process spawn error: No such file or directory
> [ 57.499535] BUG: using smp_processor_id() in preemptible [00000000] code: sh/217
> [ 57.529507] caller is debug_smp_processor_id+0x17/0x19
> [ 57.537158] CPU: 1 PID: 217 Comm: sh Not tainted 4.12.0-01059-g2a42eb9 #1
> [ 57.547065] Call Trace:
> [ 57.550929] dump_stack+0x82/0xb8
> [ 57.555948] check_preemption_disabled+0xd1/0xe3
> [ 57.562849] debug_smp_processor_id+0x17/0x19
> [ 57.569263] vtime_delta+0xd/0x2c
> [ 57.574940] task_cputime+0x89/0xdb
> [ 57.580121] thread_group_cputime+0x11b/0x1ed
> [ 57.586382] thread_group_cputime_adjusted+0x1f/0x47
> [ 57.593636] wait_consider_task+0x2a9/0xaf9
> [ 57.599761] ? lock_acquire+0x97/0xa4
> [ 57.605076] do_wait+0xdf/0x1f4
> [ 57.609778] SYSC_wait4+0x8e/0xb5
> [ 57.614704] ? list_add+0x34/0x34
> [ 57.619688] SyS_wait4+0x9/0xb
> [ 57.624194] do_syscall_64+0x70/0x82
> [ 57.629635] entry_SYSCALL64_slow_path+0x25/0x25
> [ 57.636279] RIP: 0033:0x7fe6bd468c3e
> [ 57.641598] RSP: 002b:00007ffff94e6070 EFLAGS: 00000246 ORIG_RAX: 000000000000003d
> [ 57.652428] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe6bd468c3e
> [ 57.662665] RDX: 0000000000000000 RSI: 00007ffff94e60d8 RDI: ffffffffffffffff
> [ 57.672870] RBP: 00000000006fb900 R08: 00000000006fb9c8 R09: 0000000000000000
> [ 57.683111] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
> [ 57.693517] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000ffffffff

Thanks for the report. It seems that something is changed in the
rebase, sched_clock() (in my last post
https://lkml.org/lkml/2017/5/2/191) in vtime_delta() is replaced by
sched_clock_cpu(smp_processor_id()), however, Frederic
mentioned(https://lkml.org/lkml/2017/3/30/552) that sched_clock()
alone would be enough since tsc stability is a condition for nohz_full
to work anyway. I will post a patch to fix it.

Regards,
Wanpeng Li