Re: [PATCH] workqueue: Ensure that cpumask set for pools created after boot

From: Michael Bringmann
Date: Wed May 24 2017 - 12:30:36 EST




On 05/23/2017 03:10 PM, Tejun Heo wrote:
> Hello,
>
> On Tue, May 23, 2017 at 03:09:07PM -0500, Michael Bringmann wrote:
>> To confirm, you want the WARN_ON(cpumask_any(pool->attrs->cpumask) >= NR_CPUS)
>> at the point where I place my current patch?
>
> Yeah, cpumask_weight() probably is a bit more intuitive but I'm
> curious why we're creating workqueues for a node before cpus come
> online.
>
> Thanks.
>

I am in the middle of another test, but I did find this test crash log from
one of my earlier tests. The system was configured for Shared Processors,
booting with 16 or so VPs, and then I was adding and removing them, and hit
this crash. I will get the other log later.

[ 8.599437] Unable to handle kernel paging request for unaligned access at address 0xc0000003c52231cf
[ 8.599443] Faulting instruction address: 0xc00000000049c54c
[ 8.599450] Oops: Kernel access of bad area, sig: 7 [#1]
[ 8.599454] SMP NR_CPUS=2048
[ 8.599455] NUMA
[ 8.599458] pSeries
[ 8.599463] Modules linked in:
[ 8.599470] CPU: 35 PID: 1 Comm: swapper/0 Not tainted 4.10.0-rc6_VPHNt010+ #19
[ 8.599475] task: c0000005f93c0001 task.stack: c000000bf8100000
[ 8.599480] NIP: c00000000049c54c LR: c000000000101814 CTR: c0000000001190d0
[ 8.599485] REGS: c000000bf8103520 TRAP: 0600 Not tainted (4.10.0-rc6_VPHNt010+)
[ 8.599490] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>
[ 8.599495] CR: 28108e44 XER: 0000000b
[ 8.599501] CFAR: c000000000101810 DAR: c0000003c52231cf DSISR: 00000000 SOFTE: 0
[ 8.599501] GPR00: c0000000001017dc c000000bf81037a0 c000000000fd4c00 c0000005ef7c15a0
[ 8.599501] GPR04: c0000005ef7c15a0 c0000003c52231cf 0000000000000000 0000000000000000
[ 8.599501] GPR08: c000000001014c00 69665f716573006e fa00000000000000 0000000000000000
[ 8.599501] GPR12: c0000000001190d0 c00000000e5e3b00 c00000000000d718 0000000000000000
[ 8.599501] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 8.599501] GPR20: 0000000000000000 0000000000000000 c000000000c62b00 0000000000004000
[ 8.599501] GPR24: 00000003c45bfc57 c00000000100dbe0 0000000000000000 c000000000c62b00
[ 8.599501] GPR28: 00000003c45bfc57 c0000005ef7c1d4c 0000000000000800 c0000005ef7c1580
[ 8.634917] NIP [c00000000049c54c] llist_add_batch+0xc/0x40
[ 8.634923] LR [c000000000101814] try_to_wake_up+0x3e4/0x500
[ 8.634928] Call Trace:
[ 8.634931] [c000000bf81037a0] [c0000000001017dc] try_to_wake_up+0x3ac/0x500 (unreliable)
[ 8.634939] [c000000bf8103820] [c0000000000e5508] create_worker+0x148/0x250
[ 8.679666] [c000000bf81038c0] [c0000000000e986c] alloc_unbound_pwq+0x3cc/0x4d0
[ 8.679673] [c000000bf8103960] [c0000000000e9e9c] apply_wqattrs_prepare+0x2bc/0x330
[ 8.679679] [c000000bf8103a10] [c0000000000e9f74] apply_workqueue_attrs_locked+0x64/0xd0
[ 8.679685] [c000000bf8103a80] [c0000000000ea4f4] apply_workqueue_attrs+0x64/0xa0
[ 8.679692] [c000000bf8103b00] [c0000000000ec19c] __alloc_workqueue_key+0x1cc/0x680
[ 8.679700] [c000000bf8103be0] [c000000000bcb2a4] __machine_initcall_pseries_pseries_dlpar_init+0x50/0x8c
[ 8.679707] [c000000bf8103c40] [c00000000000cef0] do_one_initcall+0x60/0x1c0
[ 8.679714] [c000000bf8103d00] [c000000000bb423c] kernel_init_freeable+0x2a8/0x390
[ 8.679719] [c000000bf8103dc0] [c00000000000d734] kernel_init+0x24/0x150
[ 8.679726] [c000000bf8103e30] [c00000000000b4e8] ret_from_kernel_thread+0x5c/0x74
[ 8.679731] Instruction dump:
[ 8.706288] 60420000 7c832378 4e800020 60000000 60000000 60000000 60000000 60000000
[ 8.706296] 60000000 e9250000 f9240000 7c0004ac <7d4028a8> 7c2a4800 40c20010 7c6029ad
[ 8.706307] ---[ end trace b6256c8c7d38d99b ]---
[ 8.706311]
[ 10.706438] Kernel panic - not syncing: Fatal exception
[ 10.706704] Rebooting in 10 seconds..




--
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line 363-5196
External: (512) 286-5196
Cell: (512) 466-0650
mwb@xxxxxxxxxxxxxxxxxx