Re: [migration_cpu_stop] WARNING: CPU: 0 PID: 11 at kernel/sched/core.c:1187 set_task_cpu+0x257/0x6b6

From: Fengguang Wu
Date: Tue Nov 21 2017 - 08:51:15 EST


On Tue, Nov 21, 2017 at 02:34:03PM +0100, Peter Zijlstra wrote:
On Tue, Nov 21, 2017 at 08:04:20PM +0800, Fengguang Wu wrote:
Hello,

FYI this happens in mainline kernel 4.14.0-00902-g43ff2f4.
It happens since 4.13.

It occurs in 7 out of 36 boots.

[ 58.171538] no ifx modem active;
[ 58.211860] ACPI: Preparing to enter system sleep state S5
[ 58.216823] reboot: Power down
[ 58.244526] acpi_power_off called
[ 61.460467] ------------[ cut here ]------------
[ 61.465481] WARNING: CPU: 0 PID: 11 at kernel/sched/core.c:1187 set_task_cpu+0x257/0x6b6:
set_task_cpu at kernel/sched/core.c:1187

Never seen that before.. but I can't even figure out how CPUs end up
offline in that reboot path :/

So far I just get lost in the callback maze... Rafael any clues?

I noticed that it's doing rcu-perf testing before the warning.
And it's immediately followed by another warning:

[ 61.653178] sched: Unexpected reschedule of offline CPU#1!
[ 61.658501] ------------[ cut here ]------------
[ 61.663130] WARNING: CPU: 0 PID: 11 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x59/0xbf
[ 61.673690] Modules linked in:
[ 61.676737] CPU: 0 PID: 11 Comm: migration/0 Tainted: G W 4.14.0-00902-g43ff2f4 #1
[ 61.684975] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 61.692977] task: ffff880016c00000 task.stack: ffff880016c08000
[ 61.698610] RIP: 0010:native_smp_send_reschedule+0x59/0xbf
[ 61.703900] RSP: 0018:ffff880016c0fc68 EFLAGS: 00010086
[ 61.710817] RAX: 000000000000002e RBX: 0000000000000001 RCX: 0000000000000000
[ 61.721649] RDX: 000000000000002e RSI: 0000000000000003 RDI: ffffed0002d81f83
[ 61.732727] RBP: ffff88000ac94640 R08: 0000000000000001 R09: 0000000000000001
[ 61.743330] R10: ffff880016c0fbc8 R11: 0000000000000001 R12: ffff88000e54c800
[ 61.753861] R13: dffffc0000000000 R14: ffff8800177e0700 R15: 0000000000000000
[ 61.764513] FS: 0000000000000000(0000) GS:ffff880017400000(0000) knlGS:0000000000000000
[ 61.776370] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 61.785022] CR2: 00007f3eca6152c0 CR3: 0000000007a15003 CR4: 00000000000206b0
[ 61.797309] Call Trace:
[ 61.801165] check_preempt_wakeup+0x386/0x5aa
[ 61.807806] check_preempt_curr+0xe3/0x22e
[ 61.814220] move_queued_task+0x39b/0x3ab
[ 61.820415] migration_cpu_stop+0x216/0x2f2
[ 61.826797] ? sched_ttwu_pending+0x1fd/0x1fd
[ 61.833542] ? lock_acquire+0xed/0x14b
[ 61.838654] ? sched_ttwu_pending+0x1fd/0x1fd
[ 61.844336] cpu_stopper_thread+0x1c9/0x25e
[ 61.849812] ? cpu_stop_create+0x6a/0x6a
[ 61.855049] smpboot_thread_fn+0x615/0x64b
[ 61.860459] ? sort_range+0x17/0x17
[ 61.863922] ? schedule+0x16e/0x190
[ 61.867626] kthread+0x365/0x374
[ 61.871029] ? sort_range+0x17/0x17
[ 61.874491] ? kthread_stop+0xc8/0xc8
[ 61.878189] ret_from_fork+0x1f/0x30
[ 61.881912] Code: 74 05 e8 db c5 29 00 3b 1d 87 54 08 02 72 02 0f ff 89 d8 48 0f a3 05 15 53 08 02 72 12 89 de 48 c7 c7 00 ac 42 a6 e8 07 00 11 00 <0f> ff eb 5f 48 c7 c7 c0 56 a2 a6 48 b8 00 00 00 00 00 fc ff df
[ 61.900994] ---[ end trace 764cf6ead543b299 ]---

which we have bisect record for:

commit 492b95e59735998312f678d77a2d5fe20af6b0b9
Author: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
AuthorDate: Fri Apr 21 16:09:15 2017 -0700
Commit: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
CommitDate: Thu Jun 8 08:25:31 2017 -0700

rcuperf: Set more user-friendly defaults

Common-case use of rcuperf must set rcuperf.nreaders=0 and if not built
as a module, rcuperf.shutdown. This commit therefore sets the default
for rcuperf.nreaders to zero and sets the default for rcuperf.shutdown
to zero if rcuperf is built as a module and to one otherwise.

Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>

3ddf20c953 srcu: Shrink Tiny SRCU a bit more
492b95e597 rcuperf: Set more user-friendly defaults
bb176f6709 Linux 4.14-rc6
36ef71cae3 Add linux-next specific files for 20171018
+--------------------------------------------------------------+------------+------------+-----------+---------------+
| | 3ddf20c953 | 492b95e597 | v4.14-rc6 | next-20171018 |
+--------------------------------------------------------------+------------+------------+-----------+---------------+
| boot_successes | 910 | 21 | 199 | 60 |
| boot_failures | 0 | 9 | 111 | 21 |
| WARNING:at_arch/x86/kernel/smp.c:#native_smp_send_reschedule | 0 | 9 | 111 | 21 |
+--------------------------------------------------------------+------------+------------+-----------+---------------+

[ 29.173654] augmented rbtree testing
[ 32.263469] -> 83223 cycles
[ 32.540332] gpio_it87: no device
[ 32.541422] sched: Unexpected reschedule of offline CPU#1!
[ 32.542355] ------------[ cut here ]------------
[ 32.543057] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x83/0xbb
[ 32.544646] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc4-00028-g492b95e #1
[ 32.545397] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 32.546381] task: ffff88001e2e0000 task.stack: ffff88001e2e8000
[ 32.547031] RIP: 0010:native_smp_send_reschedule+0x83/0xbb
[ 32.547642] RSP: 0000:ffff88001e2ebca0 EFLAGS: 00010086
[ 32.548225] RAX: 000000000000002e RBX: 0000000000000003 RCX: 0000000000000000
[ 32.548989] RDX: 0000002e00000001 RSI: 0000000000000000 RDI: 0000000000000096
[ 32.550034] RBP: ffff88001e2ebcb8 R08: 0000000000000003 R09: 0000000000000000
[ 32.551039] R10: ffff88001e2e0758 R11: 0000000000021001 R12: 0000000000000001
[ 32.551996] R13: 0000000000000000 R14: ffff88001e380000 R15: ffff88001e500000
[ 32.552727] FS: 0000000000000000(0000) GS:ffff88001e400000(0000) knlGS:0000000000000000
[ 32.553585] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 32.554227] CR2: 0000000000000000 CR3: 000000000260f000 CR4: 00000000000006b0
[ 32.555288] Call Trace:
[ 32.555687] smp_send_reschedule+0xa/0xc
[ 32.556313] try_to_wake_up+0x27e/0x2e4
[ 32.556870] wake_up_process+0x15/0x17
[ 32.557374] wake_up_worker+0x5f/0x6a
[ 32.557982] pwq_adjust_max_active+0x154/0x191
[ 32.558444] link_pwq+0xc3/0x10a
[ 32.558828] __alloc_workqueue_key+0x325/0x695
[ 32.559288] ? ftrace_likely_update+0x78/0x81
[ 32.559810] ? pci_epf_init+0x52/0x52
[ 32.560222] ? do_early_param+0xfd/0xfd
[ 32.560643] pci_epf_test_init+0x30/0x7c
[ 32.561063] ? pci_epf_init+0x52/0x52
[ 32.561431] do_one_initcall+0xbc/0x1f7
[ 32.562099] ? do_early_param+0xfd/0xfd
[ 32.562760] kernel_init_freeable+0x13f/0x233
[ 32.563497] ? rest_init+0xd6/0xd6
[ 32.564046] kernel_init+0x14/0x1c5
[ 32.564565] ? rest_init+0xd6/0xd6
[ 32.564987] ret_from_fork+0x2a/0x40
[ 32.565345] Code: 00 00 31 d2 be 01 00 00 00 48 c7 c7 88 40 7d 82 e8 8c 48 18 00 44 89 e6 48 c7 c7 5a 68 3d 82 48 ff 05 70 78 87 01 e8 45 a5 1b 00 <0f> ff b9 01 00 00 00 31 d2 be 01 00 00 00 48 c7 c7 58 40 7d 82
[ 32.568020] ---[ end trace 6590e27884081b3a ]---
[ 32.570056] VIA Graphics Integration Chipset framebuffer 2.4 initializing

Thanks,
Fengguang