Re: Workqueues splat due to ending up on wrong CPU

From: Paul E. McKenney
Date: Thu Dec 05 2019 - 22:19:26 EST


On Thu, Dec 05, 2019 at 06:48:05AM -0800, Paul E. McKenney wrote:
> On Thu, Dec 05, 2019 at 11:32:13AM +0100, Peter Zijlstra wrote:
> > On Thu, Dec 05, 2019 at 11:29:28AM +0100, Peter Zijlstra wrote:
> > > On Wed, Dec 04, 2019 at 12:11:50PM -0800, Paul E. McKenney wrote:
> > >
> > > > And the good news is that I didn't see the workqueue splat, though my
> > > > best guess is that I had about a 13% chance of not seeing it due to
> > > > random chance (and I am currently trying an idea that I hope will make
> > > > it more probable). But I did get a couple of new complaints about RCU
> > > > being used illegally from an offline CPU. Splats below.
> > >
> > > Shiny!
>
> And my attempt to speed things up did succeed, but the success was limited
> to finding more places where rcutorture chokes on CPUs being slow to boot.
> Fixing those and trying again...

And I finally got a clean run, so I am doing an overnight high-stress
baseline without your patch. The hope is that this ups the probability
of occurrence such that I can get decent stats on your patch much more
quickly.

I got several more lockdep-RCU splats similar to the one I posted earlier.

I also got the following NULL-pointer dereference out of ACPI. This had
surprisingly little effect on rcutorture, other than preventing it from
going down cleanly once the test had completed. No idea whether it is
related, though the acpi_soft_cpu_online() in the stack trace makes me
suspect that it might be.

Thanx, Paul

------------------------------------------------------------------------

[ 97.122142] BUG: kernel NULL pointer dereference, address: 0000000000000028
[ 97.224145] #PF: supervisor read access in kernel mode
[ 97.301663] #PF: error_code(0x0000) - not-present page
[ 97.376716] PGD 0 P4D 0
[ 97.413183] Oops: 0000 [#1] PREEMPT SMP PTI
[ 97.472911] CPU: 1 PID: 13 Comm: cpuhp/1 Not tainted 5.4.0-rc1+ #124
[ 97.562978] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-2.el7 04/01/2014
[ 97.684860] RIP: 0010:acpi_processor_get_platform_limit+0x5b/0xf0
[ 97.772315] Code: 48 c7 c6 5b d6 71 a7 e8 d3 4c fc ff 83 f8 05 74 0b 85 c0 c6 05 21 7c 34 02 01 75 78 48 8b 04 24 48 8b 93 78 02 00 00 89 43 18 <8b> 4a 28 48 39 c8 73 2e 48 83 bb a0 03 00 00 00 74 24 48 8d 04 40
[ 98.042953] RSP: 0000:ffffb4490007bdf8 EFLAGS: 00010246
[ 98.118508] RAX: 0000000000000000 RBX: ffff8cf91ef09800 RCX: 0000000000000000
[ 98.220612] RDX: 0000000000000000 RSI: ffff8cf91f32c910 RDI: 000000000002c910
[ 98.322913] RBP: 00000000000000b3 R08: 0000000000000000 R09: 0000000000000000
[ 98.423893] R10: 0000000000000001 R11: ffffb4490007bbe0 R12: ffff8cf91f316500
[ 98.525024] R13: 0000000000000001 R14: ffffffffa68c1630 R15: 00000000000000cd
[ 98.627690] FS: 0000000000000000(0000) GS:ffff8cf91f300000(0000) knlGS:0000000000000000
[ 98.740706] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 98.819911] CR2: 0000000000000028 CR3: 0000000010e1e000 CR4: 00000000000006e0
[ 98.918566] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 99.017110] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 99.115445] Call Trace:
[ 99.148986] acpi_processor_ppc_has_changed+0x25/0x80
[ 99.218892] acpi_soft_cpu_online+0x4e/0xc0
[ 99.276406] cpuhp_invoke_callback+0xb5/0xa30
[ 99.335921] cpuhp_thread_fun+0x179/0x210
[ 99.390697] ? cpuhp_thread_fun+0x34/0x210
[ 99.445818] smpboot_thread_fn+0x169/0x240
[ 99.501944] kthread+0xf3/0x130
[ 99.545137] ? sort_range+0x20/0x20
[ 99.592636] ? kthread_cancel_delayed_work_sync+0x10/0x10
[ 99.665536] ret_from_fork+0x3a/0x50
[ 99.715131] Modules linked in:
[ 99.755972] CR2: 0000000000000028
[ 99.800216] ---[ end trace 4f464e92083c0f67 ]---
[ 99.862848] RIP: 0010:acpi_processor_get_platform_limit+0x5b/0xf0
[ 99.945285] Code: 48 c7 c6 5b d6 71 a7 e8 d3 4c fc ff 83 f8 05 74 0b 85 c0 c6 05 21 7c 34 02 01 75 78 48 8b 04 24 48 8b 93 78 02 00 00 89 43 18 <8b> 4a 28 48 39 c8 73 2e 48 83 bb a0 03 00 00 00 74 24 48 8d 04 40
100.200302] RSP: 0000:ffffb4490007bdf8 EFLAGS: 00010246
[ 100.268022] RAX: 0000000000000000 RBX: ffff8cf91ef09800 RCX: 0000000000000000
[ 100.361897] RDX: 0000000000000000 RSI: ffff8cf91f32c910 RDI: 000000000002c910
[ 100.454643] RBP: 00000000000000b3 R08: 0000000000000000 R09: 0000000000000000
[ 100.548611] R10: 0000000000000001 R11: ffffb4490007bbe0 R12: ffff8cf91f316500
[ 100.642858] R13: 0000000000000001 R14: ffffffffa68c1630 R15: 00000000000000cd
[ 100.739082] FS: 0000000000000000(0000) GS:ffff8cf91f300000(0000) knlGS:0000000000000000
[ 100.844924] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 100.919117] CR2: 0000000000000028 CR3: 0000000010e1e000 CR4: 00000000000006e0
[ 101.011402] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 101.105372] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400