Re: [migration_cpu_stop] WARNING: CPU: 0 PID: 11 at kernel/sched/core.c:1187 set_task_cpu+0x257/0x6b6

From: Fengguang Wu
Date: Wed Nov 22 2017 - 07:19:03 EST


Hi Paul,

On Tue, Nov 21, 2017 at 08:13:35AM -0800, Paul E. McKenney wrote:
On Tue, Nov 21, 2017 at 09:51:01PM +0800, Fengguang Wu wrote:
On Tue, Nov 21, 2017 at 02:34:03PM +0100, Peter Zijlstra wrote:
>On Tue, Nov 21, 2017 at 08:04:20PM +0800, Fengguang Wu wrote:
>>Hello,
>>
>>FYI this happens in mainline kernel 4.14.0-00902-g43ff2f4.
>>It happens since 4.13.
>>
>>It occurs in 7 out of 36 boots.
>>
>>[ 58.171538] no ifx modem active;
>>[ 58.211860] ACPI: Preparing to enter system sleep state S5
>>[ 58.216823] reboot: Power down
>>[ 58.244526] acpi_power_off called
>>[ 61.460467] ------------[ cut here ]------------
>>[ 61.465481] WARNING: CPU: 0 PID: 11 at kernel/sched/core.c:1187 set_task_cpu+0x257/0x6b6:
>> set_task_cpu at kernel/sched/core.c:1187
>
>Never seen that before.. but I can't even figure out how CPUs end up
>offline in that reboot path :/
>
>So far I just get lost in the callback maze... Rafael any clues?

I noticed that it's doing rcu-perf testing before the warning.
And it's immediately followed by another warning:

[ 61.653178] sched: Unexpected reschedule of offline CPU#1!
[ 61.658501] ------------[ cut here ]------------
[ 61.663130] WARNING: CPU: 0 PID: 11 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x59/0xbf
[ 61.673690] Modules linked in:
[ 61.676737] CPU: 0 PID: 11 Comm: migration/0 Tainted: G W 4.14.0-00902-g43ff2f4 #1
[ 61.684975] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 61.692977] task: ffff880016c00000 task.stack: ffff880016c08000
[ 61.698610] RIP: 0010:native_smp_send_reschedule+0x59/0xbf
[ 61.703900] RSP: 0018:ffff880016c0fc68 EFLAGS: 00010086
[ 61.710817] RAX: 000000000000002e RBX: 0000000000000001 RCX: 0000000000000000
[ 61.721649] RDX: 000000000000002e RSI: 0000000000000003 RDI: ffffed0002d81f83
[ 61.732727] RBP: ffff88000ac94640 R08: 0000000000000001 R09: 0000000000000001
[ 61.743330] R10: ffff880016c0fbc8 R11: 0000000000000001 R12: ffff88000e54c800
[ 61.753861] R13: dffffc0000000000 R14: ffff8800177e0700 R15: 0000000000000000
[ 61.764513] FS: 0000000000000000(0000) GS:ffff880017400000(0000) knlGS:0000000000000000
[ 61.776370] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 61.785022] CR2: 00007f3eca6152c0 CR3: 0000000007a15003 CR4: 00000000000206b0
[ 61.797309] Call Trace:
[ 61.801165] check_preempt_wakeup+0x386/0x5aa
[ 61.807806] check_preempt_curr+0xe3/0x22e
[ 61.814220] move_queued_task+0x39b/0x3ab
[ 61.820415] migration_cpu_stop+0x216/0x2f2
[ 61.826797] ? sched_ttwu_pending+0x1fd/0x1fd
[ 61.833542] ? lock_acquire+0xed/0x14b
[ 61.838654] ? sched_ttwu_pending+0x1fd/0x1fd
[ 61.844336] cpu_stopper_thread+0x1c9/0x25e
[ 61.849812] ? cpu_stop_create+0x6a/0x6a
[ 61.855049] smpboot_thread_fn+0x615/0x64b
[ 61.860459] ? sort_range+0x17/0x17
[ 61.863922] ? schedule+0x16e/0x190
[ 61.867626] kthread+0x365/0x374
[ 61.871029] ? sort_range+0x17/0x17
[ 61.874491] ? kthread_stop+0xc8/0xc8
[ 61.878189] ret_from_fork+0x1f/0x30
[ 61.881912] Code: 74 05 e8 db c5 29 00 3b 1d 87 54 08 02 72 02 0f ff 89 d8 48 0f a3 05 15 53 08 02 72 12 89 de 48 c7 c7 00 ac 42 a6 e8 07 00 11 00 <0f> ff eb 5f 48 c7 c7 c0 56 a2 a6 48 b8 00 00 00 00 00 fc ff df
[ 61.900994] ---[ end trace 764cf6ead543b299 ]---

which we have bisect record for:

Which, by design, shuts down the system at the end of the test.

Don't get me wrong, I am quite happy for you guys to test rcuperf, but if
you do test it, the scaffolding either needs to suppress the end-of-test
shutdown or must recognize the end-of-test shutdown as normal behavior.
The people using rcuperf for its intended purpose are well-served by the
end-of-test shutdown, so I am not going to revert the commit shown below.

Do you mean to explicitly disable auto shutdown with this parameter
for CONFIG_RCU_PERF_TEST=y boots? Yes I can do this.

torture_param(bool, shutdown, !IS_ENABLED(MODULE),
"Shutdown at end of performance tests.");

I need other testing to either stop testing rcuperf on the one hand or to
have their rcuperf tests account for expected rcuperf behavior on the other.

So how can we best resolve this?

To me the problem is not about auto shutdown, but how come the
shutdown will trigger the warnings here. I noticed you recommend
trying these 2 commits in the other thread:

a54daabfde1a ("sched: Stop resched_cpu() from sending IPIs to offline CPUs")
e600277461ed ("sched: Stop switched_to_rt() from sending IPIs to offline CPUs")

I'll try them out. Sorry for the delay!

Regards,
Fengguang

commit 492b95e59735998312f678d77a2d5fe20af6b0b9
Author: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
AuthorDate: Fri Apr 21 16:09:15 2017 -0700
Commit: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
CommitDate: Thu Jun 8 08:25:31 2017 -0700

rcuperf: Set more user-friendly defaults

Common-case use of rcuperf must set rcuperf.nreaders=0 and if not built
as a module, rcuperf.shutdown. This commit therefore sets the default
for rcuperf.nreaders to zero and sets the default for rcuperf.shutdown
to zero if rcuperf is built as a module and to one otherwise.

Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>

3ddf20c953 srcu: Shrink Tiny SRCU a bit more
492b95e597 rcuperf: Set more user-friendly defaults
bb176f6709 Linux 4.14-rc6
36ef71cae3 Add linux-next specific files for 20171018
+--------------------------------------------------------------+------------+------------+-----------+---------------+
| | 3ddf20c953 | 492b95e597 | v4.14-rc6 | next-20171018 |
+--------------------------------------------------------------+------------+------------+-----------+---------------+
| boot_successes | 910 | 21 | 199 | 60 |
| boot_failures | 0 | 9 | 111 | 21 |
| WARNING:at_arch/x86/kernel/smp.c:#native_smp_send_reschedule | 0 | 9 | 111 | 21 |
+--------------------------------------------------------------+------------+------------+-----------+---------------+

[ 29.173654] augmented rbtree testing
[ 32.263469] -> 83223 cycles
[ 32.540332] gpio_it87: no device
[ 32.541422] sched: Unexpected reschedule of offline CPU#1!
[ 32.542355] ------------[ cut here ]------------
[ 32.543057] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x83/0xbb
[ 32.544646] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc4-00028-g492b95e #1
[ 32.545397] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 32.546381] task: ffff88001e2e0000 task.stack: ffff88001e2e8000
[ 32.547031] RIP: 0010:native_smp_send_reschedule+0x83/0xbb
[ 32.547642] RSP: 0000:ffff88001e2ebca0 EFLAGS: 00010086
[ 32.548225] RAX: 000000000000002e RBX: 0000000000000003 RCX: 0000000000000000
[ 32.548989] RDX: 0000002e00000001 RSI: 0000000000000000 RDI: 0000000000000096
[ 32.550034] RBP: ffff88001e2ebcb8 R08: 0000000000000003 R09: 0000000000000000
[ 32.551039] R10: ffff88001e2e0758 R11: 0000000000021001 R12: 0000000000000001
[ 32.551996] R13: 0000000000000000 R14: ffff88001e380000 R15: ffff88001e500000
[ 32.552727] FS: 0000000000000000(0000) GS:ffff88001e400000(0000) knlGS:0000000000000000
[ 32.553585] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 32.554227] CR2: 0000000000000000 CR3: 000000000260f000 CR4: 00000000000006b0
[ 32.555288] Call Trace:
[ 32.555687] smp_send_reschedule+0xa/0xc
[ 32.556313] try_to_wake_up+0x27e/0x2e4
[ 32.556870] wake_up_process+0x15/0x17
[ 32.557374] wake_up_worker+0x5f/0x6a
[ 32.557982] pwq_adjust_max_active+0x154/0x191
[ 32.558444] link_pwq+0xc3/0x10a
[ 32.558828] __alloc_workqueue_key+0x325/0x695
[ 32.559288] ? ftrace_likely_update+0x78/0x81
[ 32.559810] ? pci_epf_init+0x52/0x52
[ 32.560222] ? do_early_param+0xfd/0xfd
[ 32.560643] pci_epf_test_init+0x30/0x7c
[ 32.561063] ? pci_epf_init+0x52/0x52
[ 32.561431] do_one_initcall+0xbc/0x1f7
[ 32.562099] ? do_early_param+0xfd/0xfd
[ 32.562760] kernel_init_freeable+0x13f/0x233
[ 32.563497] ? rest_init+0xd6/0xd6
[ 32.564046] kernel_init+0x14/0x1c5
[ 32.564565] ? rest_init+0xd6/0xd6
[ 32.564987] ret_from_fork+0x2a/0x40
[ 32.565345] Code: 00 00 31 d2 be 01 00 00 00 48 c7 c7 88 40 7d 82 e8 8c 48 18 00 44 89 e6 48 c7 c7 5a 68 3d 82 48 ff 05 70 78 87 01 e8 45 a5 1b 00 <0f> ff b9 01 00 00 00 31 d2 be 01 00 00 00 48 c7 c7 58 40 7d 82
[ 32.568020] ---[ end trace 6590e27884081b3a ]---
[ 32.570056] VIA Graphics Integration Chipset framebuffer 2.4 initializing

Thanks,
Fengguang