Re: mm: deadlock between get_online_cpus/pcpu_alloc

From: Tetsuo Handa
Date: Tue Feb 07 2017 - 06:24:29 EST


On 2017/02/07 7:05, Mel Gorman wrote:
> On Mon, Feb 06, 2017 at 08:13:35PM +0100, Dmitry Vyukov wrote:
>> On Mon, Jan 30, 2017 at 4:48 PM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
>>> On Sun, Jan 29, 2017 at 6:22 PM, Vlastimil Babka <vbabka@xxxxxxx> wrote:
>>>> On 29.1.2017 13:44, Dmitry Vyukov wrote:
>>>>> Hello,
>>>>>
>>>>> I've got the following deadlock report while running syzkaller fuzzer
>>>>> on f37208bc3c9c2f811460ef264909dfbc7f605a60:
>>>>>
>>>>> [ INFO: possible circular locking dependency detected ]
>>>>> 4.10.0-rc5-next-20170125 #1 Not tainted
>>>>> -------------------------------------------------------
>>>>> syz-executor3/14255 is trying to acquire lock:
>>>>> (cpu_hotplug.dep_map){++++++}, at: [<ffffffff814271c7>]
>>>>> get_online_cpus+0x37/0x90 kernel/cpu.c:239
>>>>>
>>>>> but task is already holding lock:
>>>>> (pcpu_alloc_mutex){+.+.+.}, at: [<ffffffff81937fee>]
>>>>> pcpu_alloc+0xbfe/0x1290 mm/percpu.c:897
>>>>>
>>>>> which lock already depends on the new lock.
>>>>
>>>> I suspect the dependency comes from recent changes in drain_all_pages(). They
>>>> were later redone (for other reasons, but nice to have another validation) in
>>>> the mmots patch [1], which AFAICS is not yet in mmotm and thus linux-next. Could
>>>> you try if it helps?
>>>
>>> It happened only once on linux-next, so I can't verify the fix. But I
>>> will watch out for other occurrences.
>>
>> Unfortunately it does not seem to help.
>
> I'm a little stuck on how to best handle this. get_online_cpus() can
> halt forever if the hotplug operation is holding the mutex when calling
> pcpu_alloc. One option would be to add a try_get_online_cpus() helper which
> trylocks the mutex. However, given that drain is so unlikely to actually
> make that make a difference when racing against parallel allocations,
> I think this should be acceptable.
>
> Any objections?

Why below change on top of current linux.git (8b1b41ee74f9) insufficient?
I think it can eliminate IPIs a lot without introducing lockdep warnings.

----------
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f3e0c69..ae6e7aa 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2354,6 +2354,7 @@ void drain_local_pages(struct zone *zone)
*/
void drain_all_pages(struct zone *zone)
{
+ static DEFINE_MUTEX(lock);
int cpu;

/*
@@ -2362,6 +2363,7 @@ void drain_all_pages(struct zone *zone)
*/
static cpumask_t cpus_with_pcps;

+ mutex_lock(&lock);
/*
* We don't care about racing with CPU hotplug event
* as offline notification will cause the notified
@@ -2394,6 +2396,7 @@ void drain_all_pages(struct zone *zone)
}
on_each_cpu_mask(&cpus_with_pcps, (smp_call_func_t) drain_local_pages,
zone, 1);
+ mutex_unlock(&lock);
}

#ifdef CONFIG_HIBERNATION
----------

By the way, drain_all_pages() is a sleepable context, isn't it?
I don't get get soft lockup using current linux.git (8b1b41ee74f9).
But I trivially get soft lockup if I try above change after reverting
"mm, page_alloc: use static global work_struct for draining per-cpu pages" and
"mm, page_alloc: drain per-cpu pages from workqueue context" on linux-next-20170207 .
List corruption also came in with linux-next-20170207 .

----------
[ 32.672890] ip6_tables: (C) 2000-2006 Netfilter Core Team
[ 33.109860] Ebtables v2.0 registered
[ 33.410293] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
[ 33.935512] IPv6: ADDRCONF(NETDEV_UP): eno16777728: link is not ready
[ 33.937478] e1000: eno16777728 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[ 33.939777] IPv6: ADDRCONF(NETDEV_CHANGE): eno16777728: link becomes ready
[ 34.194000] Netfilter messages via NETLINK v0.30.
[ 34.258828] ip_set: protocol 6
[ 38.518375] nf_conntrack: default automatic helper assignment has been turned off for security reasons and CT-based firewall rule not found. Use the iptables CT target to attach helpers instead.
[ 43.911609] cp (5167) used greatest stack depth: 10488 bytes left
[ 48.174125] cp (5860) used greatest stack depth: 10224 bytes left
[ 101.075280] ------------[ cut here ]------------
[ 101.076743] WARNING: CPU: 0 PID: 9766 at lib/list_debug.c:25 __list_add_valid+0x46/0xa0
[ 101.079095] list_add corruption. next->prev should be prev (ffffea00013dfce0), but was ffff88006d3e1d40. (next=ffff88006d3e1d40).
[ 101.081863] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd ppdev glue_helper vmw_balloon pcspkr sg parport_pc parport i2c_piix4 shpchp vmw_vsock_vmci_transport vsock vmw_vmci ip_tables xfs libcrc32c sd_mod sr_mod cdrom ata_generic pata_acpi crc32c_intel vmwgfx serio_raw drm_kms_helper syscopyarea sysfillrect
[ 101.098346] sysimgblt fb_sys_fops ttm mptspi scsi_transport_spi ata_piix mptscsih ahci drm libahci mptbase libata e1000 i2c_core
[ 101.101279] CPU: 0 PID: 9766 Comm: oom-write Tainted: G W 4.10.0-rc7-next-20170207+ #500
[ 101.103519] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[ 101.106031] Call Trace:
[ 101.107069] dump_stack+0x85/0xc9
[ 101.108239] __warn+0xd1/0xf0
[ 101.109566] warn_slowpath_fmt+0x5f/0x80
[ 101.110843] __list_add_valid+0x46/0xa0
[ 101.112187] free_hot_cold_page+0x205/0x460
[ 101.113593] free_hot_cold_page_list+0x3c/0x1c0
[ 101.115046] shrink_page_list+0x4dd/0xd10
[ 101.116388] shrink_inactive_list+0x1c5/0x660
[ 101.117796] shrink_node_memcg+0x535/0x7f0
[ 101.119158] ? mem_cgroup_iter+0x1e0/0x720
[ 101.120873] shrink_node+0xe1/0x310
[ 101.122250] do_try_to_free_pages+0xe1/0x300
[ 101.123611] try_to_free_pages+0x131/0x3f0
[ 101.124998] __alloc_pages_slowpath+0x479/0xe32
[ 101.126421] __alloc_pages_nodemask+0x382/0x3d0
[ 101.128053] alloc_pages_vma+0xae/0x2f0
[ 101.129353] do_anonymous_page+0x111/0x5d0
[ 101.130725] __handle_mm_fault+0xbc9/0xeb0
[ 101.132127] ? sched_clock+0x9/0x10
[ 101.133389] ? sched_clock_cpu+0x11/0xc0
[ 101.134764] handle_mm_fault+0x16b/0x390
[ 101.136147] ? handle_mm_fault+0x49/0x390
[ 101.137545] __do_page_fault+0x24a/0x530
[ 101.138999] do_page_fault+0x30/0x80
[ 101.140375] page_fault+0x28/0x30
[ 101.141685] RIP: 0033:0x4006a0
[ 101.142880] RSP: 002b:00007ffe3a4acc10 EFLAGS: 00010206
[ 101.144638] RAX: 0000000037f8e000 RBX: 0000000080000000 RCX: 00007fc5a1fda650
[ 101.146653] RDX: 0000000000000000 RSI: 00007ffe3a4aca30 RDI: 00007ffe3a4aca30
[ 101.148768] RBP: 00007fc4a2111010 R08: 00007ffe3a4acb40 R09: 00007ffe3a4ac980
[ 101.150893] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000007
[ 101.152937] R13: 00007fc4a2111010 R14: 0000000000000000 R15: 0000000000000000
[ 101.155240] ---[ end trace 5d8b63572ab78be3 ]---
[ 128.945470] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [oom-write:9766]
[ 128.947878] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd ppdev glue_helper vmw_balloon pcspkr sg parport_pc parport i2c_piix4 shpchp vmw_vsock_vmci_transport vsock vmw_vmci ip_tables xfs libcrc32c sd_mod sr_mod cdrom ata_generic pata_acpi crc32c_intel vmwgfx serio_raw drm_kms_helper syscopyarea sysfillrect
[ 128.966961] sysimgblt fb_sys_fops ttm mptspi scsi_transport_spi ata_piix mptscsih ahci drm libahci mptbase libata e1000 i2c_core
[ 128.970135] irq event stamp: 2491700
[ 128.971666] hardirqs last enabled at (2491699): [<ffffffff817e5d70>] restore_regs_and_iret+0x0/0x1d
[ 128.974560] hardirqs last disabled at (2491700): [<ffffffff817e7198>] apic_timer_interrupt+0x98/0xb0
[ 128.977148] softirqs last enabled at (2445236): [<ffffffff817eab39>] __do_softirq+0x349/0x52d
[ 128.979639] softirqs last disabled at (2445215): [<ffffffff810a99e5>] irq_exit+0xf5/0x110
[ 128.982019] CPU: 2 PID: 9766 Comm: oom-write Tainted: G W 4.10.0-rc7-next-20170207+ #500
[ 128.984598] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[ 128.987490] task: ffff880067ed8040 task.stack: ffffc9001117c000
[ 128.989464] RIP: 0010:smp_call_function_many+0x25c/0x320
[ 128.991305] RSP: 0000:ffffc9001117fad0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
[ 128.993609] RAX: 0000000000000000 RBX: ffff88006d7dd640 RCX: 0000000000000001
[ 128.995823] RDX: 0000000000000001 RSI: ffff88006d3e3398 RDI: ffff88006c528dc8
[ 128.998058] RBP: ffffc9001117fb18 R08: 0000000000000009 R09: 0000000000000000
[ 129.000284] R10: 0000000000000001 R11: ffff88005486d4a8 R12: 0000000000000000
[ 129.002521] R13: ffffffff811fe6e0 R14: 0000000000000000 R15: 0000000000000080
[ 129.004788] FS: 00007fc5a24ec740(0000) GS:ffff88006d600000(0000) knlGS:0000000000000000
[ 129.007235] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 129.009211] CR2: 00007fc4da0c5010 CR3: 0000000044314000 CR4: 00000000001406e0
[ 129.011641] Call Trace:
[ 129.012993] ? page_alloc_cpu_dead+0x30/0x30
[ 129.014701] on_each_cpu_mask+0x30/0xb0
[ 129.016318] drain_all_pages+0x113/0x170
[ 129.017949] __alloc_pages_slowpath+0x520/0xe32
[ 129.019710] __alloc_pages_nodemask+0x382/0x3d0
[ 129.021472] alloc_pages_vma+0xae/0x2f0
[ 129.023080] do_anonymous_page+0x111/0x5d0
[ 129.024760] __handle_mm_fault+0xbc9/0xeb0
[ 129.026449] ? sched_clock+0x9/0x10
[ 129.028042] ? sched_clock_cpu+0x11/0xc0
[ 129.029729] handle_mm_fault+0x16b/0x390
[ 129.031418] ? handle_mm_fault+0x49/0x390
[ 129.033077] __do_page_fault+0x24a/0x530
[ 129.034721] do_page_fault+0x30/0x80
[ 129.036279] page_fault+0x28/0x30
[ 129.038160] RIP: 0033:0x4006a0
[ 129.039600] RSP: 002b:00007ffe3a4acc10 EFLAGS: 00010206
[ 129.041488] RAX: 0000000037fb4000 RBX: 0000000080000000 RCX: 00007fc5a1fda650
[ 129.043758] RDX: 0000000000000000 RSI: 00007ffe3a4aca30 RDI: 00007ffe3a4aca30
[ 129.046037] RBP: 00007fc4a2111010 R08: 00007ffe3a4acb40 R09: 00007ffe3a4ac980
[ 129.048324] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000007
[ 129.050587] R13: 00007fc4a2111010 R14: 0000000000000000 R15: 0000000000000000
[ 129.052829] Code: 7b 3e 2b 00 3b 05 69 b6 d5 00 41 89 c4 0f 8d 3f fe ff ff 48 63 d0 48 8b 33 48 03 34 d5 60 c4 ab 81 8b 56 18 83 e2 01 74 0a f3 90 <8b> 4e 18 83 e1 01 75 f6 83 f8 ff 48 8b 7b 08 74 2a 48 63 35 30
[ 156.945366] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [oom-write:9766]
[ 156.947797] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd ppdev glue_helper vmw_balloon pcspkr sg parport_pc parport i2c_piix4 shpchp vmw_vsock_vmci_transport vsock vmw_vmci ip_tables xfs libcrc32c sd_mod sr_mod cdrom ata_generic pata_acpi crc32c_intel vmwgfx serio_raw drm_kms_helper syscopyarea sysfillrect
[ 156.968021] sysimgblt fb_sys_fops ttm mptspi scsi_transport_spi ata_piix mptscsih ahci drm libahci mptbase libata e1000 i2c_core
[ 156.971269] irq event stamp: 2547400
[ 156.972865] hardirqs last enabled at (2547399): [<ffffffff817e5d70>] restore_regs_and_iret+0x0/0x1d
[ 156.975579] hardirqs last disabled at (2547400): [<ffffffff817e7198>] apic_timer_interrupt+0x98/0xb0
[ 156.978287] softirqs last enabled at (2445236): [<ffffffff817eab39>] __do_softirq+0x349/0x52d
[ 156.980863] softirqs last disabled at (2445215): [<ffffffff810a99e5>] irq_exit+0xf5/0x110
[ 156.983393] CPU: 2 PID: 9766 Comm: oom-write Tainted: G W L 4.10.0-rc7-next-20170207+ #500
[ 156.986111] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[ 156.989117] task: ffff880067ed8040 task.stack: ffffc9001117c000
[ 156.991179] RIP: 0010:smp_call_function_many+0x25c/0x320
[ 156.993113] RSP: 0000:ffffc9001117fad0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
[ 156.995501] RAX: 0000000000000000 RBX: ffff88006d7dd640 RCX: 0000000000000001
[ 156.997803] RDX: 0000000000000001 RSI: ffff88006d3e3398 RDI: ffff88006c528dc8
[ 157.000111] RBP: ffffc9001117fb18 R08: 0000000000000009 R09: 0000000000000000
[ 157.002402] R10: 0000000000000001 R11: ffff88005486d4a8 R12: 0000000000000000
[ 157.004700] R13: ffffffff811fe6e0 R14: 0000000000000000 R15: 0000000000000080
[ 157.006980] FS: 00007fc5a24ec740(0000) GS:ffff88006d600000(0000) knlGS:0000000000000000
[ 157.009477] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 157.011494] CR2: 00007fc4da0c5010 CR3: 0000000044314000 CR4: 00000000001406e0
[ 157.013844] Call Trace:
[ 157.015231] ? page_alloc_cpu_dead+0x30/0x30
[ 157.016975] on_each_cpu_mask+0x30/0xb0
[ 157.018723] drain_all_pages+0x113/0x170
[ 157.020410] __alloc_pages_slowpath+0x520/0xe32
[ 157.022219] __alloc_pages_nodemask+0x382/0x3d0
[ 157.024036] alloc_pages_vma+0xae/0x2f0
[ 157.025704] do_anonymous_page+0x111/0x5d0
[ 157.027430] __handle_mm_fault+0xbc9/0xeb0
[ 157.029143] ? sched_clock+0x9/0x10
[ 157.030739] ? sched_clock_cpu+0x11/0xc0
[ 157.032416] handle_mm_fault+0x16b/0x390
[ 157.034100] ? handle_mm_fault+0x49/0x390
[ 157.035843] __do_page_fault+0x24a/0x530
[ 157.037537] do_page_fault+0x30/0x80
[ 157.039160] page_fault+0x28/0x30
[ 157.040792] RIP: 0033:0x4006a0
[ 157.042290] RSP: 002b:00007ffe3a4acc10 EFLAGS: 00010206
[ 157.044231] RAX: 0000000037fb4000 RBX: 0000000080000000 RCX: 00007fc5a1fda650
[ 157.046635] RDX: 0000000000000000 RSI: 00007ffe3a4aca30 RDI: 00007ffe3a4aca30
[ 157.048941] RBP: 00007fc4a2111010 R08: 00007ffe3a4acb40 R09: 00007ffe3a4ac980
[ 157.051324] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000007
[ 157.053577] R13: 00007fc4a2111010 R14: 0000000000000000 R15: 0000000000000000
[ 157.055822] Code: 7b 3e 2b 00 3b 05 69 b6 d5 00 41 89 c4 0f 8d 3f fe ff ff 48 63 d0 48 8b 33 48 03 34 d5 60 c4 ab 81 8b 56 18 83 e2 01 74 0a f3 90 <8b> 4e 18 83 e1 01 75 f6 83 f8 ff 48 8b 7b 08 74 2a 48 63 35 30
[ 171.241423] BUG: spinlock lockup suspected on CPU#3, swapper/3/0
[ 171.243575] lock: 0xffff88007ffddd00, .magic: dead4ead, .owner: kworker/0:3/9777, .owner_cpu: 0
[ 171.246136] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G W L 4.10.0-rc7-next-20170207+ #500
[ 171.248706] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[ 171.251617] Call Trace:
[ 171.252854] <IRQ>
[ 171.253989] dump_stack+0x85/0xc9
[ 171.255376] spin_dump+0x90/0x95
[ 171.256741] do_raw_spin_lock+0x9a/0x130
[ 171.258246] _raw_spin_lock_irqsave+0x75/0x90
[ 171.259822] ? free_pcppages_bulk+0x37/0x910
[ 171.261359] free_pcppages_bulk+0x37/0x910
[ 171.262843] ? sched_clock_cpu+0x11/0xc0
[ 171.264289] ? sched_clock_tick+0x2d/0x80
[ 171.265747] drain_pages_zone+0x82/0x90
[ 171.267154] ? page_alloc_cpu_dead+0x30/0x30
[ 171.268644] drain_pages+0x3f/0x60
[ 171.269955] drain_local_pages+0x25/0x30
[ 171.271366] flush_smp_call_function_queue+0x7b/0x170
[ 171.273014] generic_smp_call_function_single_interrupt+0x13/0x30
[ 171.274878] smp_call_function_interrupt+0x27/0x40
[ 171.276460] call_function_interrupt+0x9d/0xb0
[ 171.277963] RIP: 0010:native_safe_halt+0x6/0x10
[ 171.279467] RSP: 0018:ffffc900003a3e70 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff03
[ 171.281610] RAX: ffff88006c5c0040 RBX: 0000000000000000 RCX: 0000000000000000
[ 171.283662] RDX: ffff88006c5c0040 RSI: 0000000000000001 RDI: ffff88006c5c0040
[ 171.285727] RBP: ffffc900003a3e70 R08: 0000000000000000 R09: 0000000000000000
[ 171.287778] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000003
[ 171.289831] R13: ffff88006c5c0040 R14: ffff88006c5c0040 R15: 0000000000000000
[ 171.291882] </IRQ>
[ 171.292934] default_idle+0x23/0x1d0
[ 171.294287] arch_cpu_idle+0xf/0x20
[ 171.295618] default_idle_call+0x23/0x40
[ 171.297040] do_idle+0x162/0x230
[ 171.298308] cpu_startup_entry+0x71/0x80
[ 171.299732] start_secondary+0x17f/0x1f0
[ 171.301146] start_cpu+0x14/0x14
[ 171.302432] NMI backtrace for cpu 3
[ 171.303764] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G W L 4.10.0-rc7-next-20170207+ #500
[ 171.306218] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[ 171.309065] Call Trace:
[ 171.310500] <IRQ>
[ 171.311569] dump_stack+0x85/0xc9
[ 171.312946] nmi_cpu_backtrace+0xc0/0xe0
[ 171.314396] ? irq_force_complete_move+0x170/0x170
[ 171.316012] nmi_trigger_cpumask_backtrace+0x12a/0x188
[ 171.317679] arch_trigger_cpumask_backtrace+0x19/0x20
[ 171.319294] do_raw_spin_lock+0xa8/0x130
[ 171.320686] _raw_spin_lock_irqsave+0x75/0x90
[ 171.322148] ? free_pcppages_bulk+0x37/0x910
[ 171.323585] free_pcppages_bulk+0x37/0x910
[ 171.325037] ? sched_clock_cpu+0x11/0xc0
[ 171.326615] ? sched_clock_tick+0x2d/0x80
[ 171.327974] drain_pages_zone+0x82/0x90
[ 171.329294] ? page_alloc_cpu_dead+0x30/0x30
[ 171.330707] drain_pages+0x3f/0x60
[ 171.331938] drain_local_pages+0x25/0x30
[ 171.333283] flush_smp_call_function_queue+0x7b/0x170
[ 171.334855] generic_smp_call_function_single_interrupt+0x13/0x30
[ 171.336643] smp_call_function_interrupt+0x27/0x40
[ 171.338456] call_function_interrupt+0x9d/0xb0
[ 171.339911] RIP: 0010:native_safe_halt+0x6/0x10
[ 171.341522] RSP: 0018:ffffc900003a3e70 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff03
[ 171.343860] RAX: ffff88006c5c0040 RBX: 0000000000000000 RCX: 0000000000000000
[ 171.346001] RDX: ffff88006c5c0040 RSI: 0000000000000001 RDI: ffff88006c5c0040
[ 171.348134] RBP: ffffc900003a3e70 R08: 0000000000000000 R09: 0000000000000000
[ 171.350270] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000003
[ 171.352384] R13: ffff88006c5c0040 R14: ffff88006c5c0040 R15: 0000000000000000
[ 171.354503] </IRQ>
[ 171.355616] default_idle+0x23/0x1d0
[ 171.357055] arch_cpu_idle+0xf/0x20
[ 171.361577] default_idle_call+0x23/0x40
[ 171.364329] do_idle+0x162/0x230
[ 171.365799] cpu_startup_entry+0x71/0x80
[ 171.367172] start_secondary+0x17f/0x1f0
[ 171.368530] start_cpu+0x14/0x14
[ 171.369748] Sending NMI from CPU 3 to CPUs 0-2:
[ 171.371367] NMI backtrace for cpu 2
[ 171.371368] CPU: 2 PID: 9766 Comm: oom-write Tainted: G W L 4.10.0-rc7-next-20170207+ #500
[ 171.371368] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[ 171.371369] task: ffff880067ed8040 task.stack: ffffc9001117c000
[ 171.371369] RIP: 0010:smp_call_function_many+0x25c/0x320
[ 171.371370] RSP: 0000:ffffc9001117fad0 EFLAGS: 00000202
[ 171.371371] RAX: 0000000000000000 RBX: ffff88006d7dd640 RCX: 0000000000000001
[ 171.371372] RDX: 0000000000000001 RSI: ffff88006d3e3398 RDI: ffff88006c528dc8
[ 171.371372] RBP: ffffc9001117fb18 R08: 0000000000000009 R09: 0000000000000000
[ 171.371372] R10: 0000000000000001 R11: ffff88005486d4a8 R12: 0000000000000000
[ 171.371373] R13: ffffffff811fe6e0 R14: 0000000000000000 R15: 0000000000000080
[ 171.371373] FS: 00007fc5a24ec740(0000) GS:ffff88006d600000(0000) knlGS:0000000000000000
[ 171.371374] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 171.371374] CR2: 00007fc4da0c5010 CR3: 0000000044314000 CR4: 00000000001406e0
[ 171.371375] Call Trace:
[ 171.371375] ? page_alloc_cpu_dead+0x30/0x30
[ 171.371375] on_each_cpu_mask+0x30/0xb0
[ 171.371376] drain_all_pages+0x113/0x170
[ 171.371376] __alloc_pages_slowpath+0x520/0xe32
[ 171.371377] __alloc_pages_nodemask+0x382/0x3d0
[ 171.371377] alloc_pages_vma+0xae/0x2f0
[ 171.371377] do_anonymous_page+0x111/0x5d0
[ 171.371378] __handle_mm_fault+0xbc9/0xeb0
[ 171.371378] ? sched_clock+0x9/0x10
[ 171.371379] ? sched_clock_cpu+0x11/0xc0
[ 171.371379] handle_mm_fault+0x16b/0x390
[ 171.371379] ? handle_mm_fault+0x49/0x390
[ 171.371380] __do_page_fault+0x24a/0x530
[ 171.371380] do_page_fault+0x30/0x80
[ 171.371381] page_fault+0x28/0x30
[ 171.371381] RIP: 0033:0x4006a0
[ 171.371381] RSP: 002b:00007ffe3a4acc10 EFLAGS: 00010206
[ 171.371382] RAX: 0000000037fb4000 RBX: 0000000080000000 RCX: 00007fc5a1fda650
[ 171.371383] RDX: 0000000000000000 RSI: 00007ffe3a4aca30 RDI: 00007ffe3a4aca30
[ 171.371383] RBP: 00007fc4a2111010 R08: 00007ffe3a4acb40 R09: 00007ffe3a4ac980
[ 171.371384] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000007
[ 171.371384] R13: 00007fc4a2111010 R14: 0000000000000000 R15: 0000000000000000
[ 171.371385] Code: 7b 3e 2b 00 3b 05 69 b6 d5 00 41 89 c4 0f 8d 3f fe ff ff 48 63 d0 48 8b 33 48 03 34 d5 60 c4 ab 81 8b 56 18 83 e2 01 74 0a f3 90 <8b> 4e 18 83 e1 01 75 f6 83 f8 ff 48 8b 7b 08 74 2a 48 63 35 30
[ 171.372317] NMI backtrace for cpu 0
[ 171.372317] CPU: 0 PID: 9777 Comm: kworker/0:3 Tainted: G W L 4.10.0-rc7-next-20170207+ #500
[ 171.372318] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[ 171.372318] Workqueue: events vmw_fb_dirty_flush [vmwgfx]
[ 171.372319] task: ffff880064082540 task.stack: ffffc90013b54000
[ 171.372320] RIP: 0010:free_pcppages_bulk+0xbb/0x910
[ 171.372320] RSP: 0000:ffff88006d203eb0 EFLAGS: 00000002
[ 171.372321] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000010
[ 171.372322] RDX: 0000000000000000 RSI: 00000000357cc759 RDI: ffff88006d3e1d50
[ 171.372322] RBP: ffff88006d203f38 R08: ffff88006d3e1d20 R09: 0000000000000002
[ 171.372323] R10: 0000000000000000 R11: 000000000004f7f0 R12: ffff88007ffdd8f8
[ 171.372323] R13: ffffea00013dfc20 R14: ffffea00013dfc00 R15: ffff88007ffdd740
[ 171.372324] FS: 0000000000000000(0000) GS:ffff88006d200000(0000) knlGS:0000000000000000
[ 171.372324] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 171.372325] CR2: 00007fc4da09f010 CR3: 0000000044314000 CR4: 00000000001406f0
[ 171.372325] Call Trace:
[ 171.372325] <IRQ>
[ 171.372326] ? trace_hardirqs_off+0xd/0x10
[ 171.372326] drain_pages_zone+0x82/0x90
[ 171.372327] ? page_alloc_cpu_dead+0x30/0x30
[ 171.372327] drain_pages+0x3f/0x60
[ 171.372327] drain_local_pages+0x25/0x30
[ 171.372328] flush_smp_call_function_queue+0x7b/0x170
[ 171.372328] generic_smp_call_function_single_interrupt+0x13/0x30
[ 171.372329] smp_call_function_interrupt+0x27/0x40
[ 171.372329] call_function_interrupt+0x9d/0xb0
[ 171.372330] RIP: 0010:memcpy_orig+0x19/0x110
[ 171.372330] RSP: 0000:ffffc90013b57d98 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff03
[ 171.372331] RAX: ffffc90003f1c400 RBX: ffff880063b54a88 RCX: 0000000000000004
[ 171.372331] RDX: 00000000000010e0 RSI: ffffc900037d3900 RDI: ffffc90003f1c6e0
[ 171.372332] RBP: ffffc90013b57e08 R08: 0000000000000000 R09: 0000000000000000
[ 171.372332] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000001400
[ 171.372333] R13: 000000000000011e R14: ffff880063b55168 R15: ffffc900037d3620
[ 171.372333] </IRQ>
[ 171.372333] ? vmw_fb_dirty_flush+0x1ef/0x2b0 [vmwgfx]
[ 171.372334] process_one_work+0x22b/0x760
[ 171.372334] ? process_one_work+0x194/0x760
[ 171.372335] worker_thread+0x137/0x4b0
[ 171.372335] kthread+0x10f/0x150
[ 171.372335] ? process_one_work+0x760/0x760
[ 171.372336] ? kthread_create_on_node+0x70/0x70
[ 171.372336] ? do_syscall_64+0x6c/0x200
[ 171.372337] ret_from_fork+0x31/0x40
[ 171.372337] Code: 00 00 4d 89 cf 8b 45 98 8b 75 9c 31 d2 4c 8b 85 78 ff ff ff 83 c0 01 83 c6 01 83 f8 03 0f 44 c2 48 63 c8 48 83 c1 01 48 c1 e1 04 <4c> 01 c1 48 8b 39 48 39 f9 74 de 89 45 98 83 fe 03 89 f0 0f 44
[ 171.372357] NMI backtrace for cpu 1
[ 171.372357] CPU: 1 PID: 47 Comm: khugepaged Tainted: G W L 4.10.0-rc7-next-20170207+ #500
[ 171.372358] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[ 171.372358] task: ffff88006bd22540 task.stack: ffffc90000890000
[ 171.372359] RIP: 0010:delay_tsc+0x48/0x70
[ 171.372359] RSP: 0018:ffffc90000893520 EFLAGS: 00000006
[ 171.372360] RAX: 000000004e0ee6be RBX: ffff88007ffddd00 RCX: 000000774e0ee6a6
[ 171.372360] RDX: 0000000000000077 RSI: 0000000000000001 RDI: 0000000000000001
[ 171.372361] RBP: ffffc90000893520 R08: 0000000000000000 R09: 0000000000000000
[ 171.372361] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000a6822110
[ 171.372362] R13: 0000000083093e7f R14: 0000000000000001 R15: ffffea000137a020
[ 171.372362] FS: 0000000000000000(0000) GS:ffff88006d400000(0000) knlGS:0000000000000000
[ 171.372363] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 171.372363] CR2: 00007ff5aeb82000 CR3: 0000000068093000 CR4: 00000000001406e0
[ 171.372363] Call Trace:
[ 171.372364] __delay+0xf/0x20
[ 171.372364] do_raw_spin_lock+0x86/0x130
[ 171.372365] _raw_spin_lock_irqsave+0x75/0x90
[ 171.372365] ? free_pcppages_bulk+0x37/0x910
[ 171.372365] free_pcppages_bulk+0x37/0x910
[ 171.372366] ? __kernel_map_pages+0x87/0x120
[ 171.372366] free_hot_cold_page+0x373/0x460
[ 171.372367] free_hot_cold_page_list+0x3c/0x1c0
[ 171.372367] shrink_page_list+0x4dd/0xd10
[ 171.372368] shrink_inactive_list+0x1c5/0x660
[ 171.372368] shrink_node_memcg+0x535/0x7f0
[ 171.372368] ? mem_cgroup_iter+0x14d/0x720
[ 171.372369] shrink_node+0xe1/0x310
[ 171.372369] do_try_to_free_pages+0xe1/0x300
[ 171.372370] try_to_free_pages+0x131/0x3f0
[ 171.372370] __alloc_pages_slowpath+0x479/0xe32
[ 171.372370] __alloc_pages_nodemask+0x382/0x3d0
[ 171.372371] khugepaged_alloc_page+0x6d/0xd0
[ 171.372371] collapse_huge_page+0x81/0x1240
[ 171.372372] ? sched_clock_cpu+0x11/0xc0
[ 171.372372] ? khugepaged_scan_mm_slot+0xc26/0x1000
[ 171.372373] khugepaged_scan_mm_slot+0xc49/0x1000
[ 171.372373] ? sched_clock_cpu+0x11/0xc0
[ 171.372373] ? finish_wait+0x75/0x90
[ 171.372374] khugepaged+0x327/0x5e0
[ 171.372374] ? remove_wait_queue+0x60/0x60
[ 171.372375] kthread+0x10f/0x150
[ 171.372375] ? khugepaged_scan_mm_slot+0x1000/0x1000
[ 171.372375] ? kthread_create_on_node+0x70/0x70
[ 171.372376] ret_from_fork+0x31/0x40
[ 171.372376] Code: 89 d1 48 c1 e1 20 48 09 c1 eb 1b 65 ff 0d c9 0a c1 7e f3 90 65 ff 05 c0 0a c1 7e 65 8b 05 51 d7 c0 7e 39 c6 75 20 0f ae e8 0f 31 <48> c1 e2 20 48 09 c2 48 89 d0 48 29 c8 48 39 f8 72 ce 65 ff 0d
----------

I also got soft lockup without any change using linux-next-20170202.
Something is wrong with calling multiple CPUs? List corruption?

----------
[ 80.556598] ip6_tables: (C) 2000-2006 Netfilter Core Team
[ 84.020374] IPv6: ADDRCONF(NETDEV_UP): eno16777728: link is not ready
[ 84.024423] e1000: eno16777728 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[ 84.030731] IPv6: ADDRCONF(NETDEV_CHANGE): eno16777728: link becomes ready
[ 84.212613] Ebtables v2.0 registered
[ 86.841246] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
[ 90.594709] Netfilter messages via NETLINK v0.30.
[ 90.756119] ip_set: protocol 6
[ 161.309259] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [ip6tables-resto:4210]
[ 162.329982] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd ppdev glue_helper vmw_balloon pcspkr sg parport_pc parport i2c_piix4 shpchp vmw_vsock_vmci_transport vsock vmw_vmci ip_tables xfs libcrc32c sd_mod sr_mod cdrom ata_generic pata_acpi crc32c_intel serio_raw vmwgfx drm_kms_helper syscopyarea sysfillrect
[ 162.486639] sysimgblt fb_sys_fops ttm e1000 mptspi ahci scsi_transport_spi drm libahci ata_piix mptscsih i2c_core mptbase libata
[ 162.486651] irq event stamp: 306010
[ 162.486656] hardirqs last enabled at (306009): [<ffffffff817e4970>] restore_regs_and_iret+0x0/0x1d
[ 162.486658] hardirqs last disabled at (306010): [<ffffffff817e5d98>] apic_timer_interrupt+0x98/0xb0
[ 162.486661] softirqs last enabled at (306008): [<ffffffff817e9739>] __do_softirq+0x349/0x52d
[ 162.486664] softirqs last disabled at (306001): [<ffffffff810a98c5>] irq_exit+0xf5/0x110
[ 162.486666] CPU: 3 PID: 4210 Comm: ip6tables-resto Not tainted 4.10.0-rc6-next-20170202 #498
[ 162.486667] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[ 162.486668] task: ffff880062eb4a40 task.stack: ffffc900054e8000
[ 162.486671] RIP: 0010:smp_call_function_many+0x25c/0x320
[ 162.486672] RSP: 0018:ffffc900054ebc98 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
[ 162.486674] RAX: 0000000000000000 RBX: ffff88006d9dd680 RCX: 0000000000000001
[ 162.486675] RDX: 0000000000000001 RSI: ffff88006d3e3600 RDI: ffff88006c52ad68
[ 162.486675] RBP: ffffc900054ebce0 R08: 0000000000000007 R09: 0000000000000000
[ 162.486676] R10: 0000000000000001 R11: ffff880067ecd768 R12: 0000000000000000
[ 162.486677] R13: ffffffff81080790 R14: ffffc900054ebd18 R15: 0000000000000080
[ 162.486678] FS: 00007f37a4b86740(0000) GS:ffff88006d800000(0000) knlGS:0000000000000000
[ 162.486679] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 162.486680] CR2: 00000000008ed017 CR3: 0000000053f70000 CR4: 00000000001406e0
[ 162.486719] Call Trace:
[ 162.486725] ? x86_configure_nx+0x50/0x50
[ 162.486727] on_each_cpu+0x3b/0xa0
[ 162.486730] flush_tlb_kernel_range+0x79/0x80
[ 162.486734] remove_vm_area+0xb1/0xc0
[ 162.486737] __vunmap+0x2e/0x110
[ 162.486739] vfree+0x2e/0x70
[ 162.486744] do_ip6t_get_ctl+0x2de/0x370 [ip6_tables]
[ 162.486751] nf_getsockopt+0x49/0x70
[ 162.486755] ipv6_getsockopt+0xd3/0x130
[ 162.486758] rawv6_getsockopt+0x2c/0x70
[ 162.486761] sock_common_getsockopt+0x14/0x20
[ 162.486763] SyS_getsockopt+0x77/0xe0
[ 162.486767] do_syscall_64+0x6c/0x200
[ 162.486770] entry_SYSCALL64_slow_path+0x25/0x25
[ 162.486771] RIP: 0033:0x7f37a3d9151a
[ 162.486772] RSP: 002b:00007ffeecccd9e8 EFLAGS: 00000202 ORIG_RAX: 0000000000000037
[ 162.486774] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f37a3d9151a
[ 162.486774] RDX: 0000000000000041 RSI: 0000000000000029 RDI: 0000000000000003
[ 162.486775] RBP: 00000000008e90c0 R08: 00007ffeecccda30 R09: feff7164736b6865
[ 162.486776] R10: 00000000008e90c0 R11: 0000000000000202 R12: 00007ffeecccdf60
[ 162.486777] R13: 00000000008e9010 R14: 00007ffeecccda40 R15: 0000000000000000
[ 162.486783] Code: bb 39 2b 00 3b 05 a9 40 c7 00 41 89 c4 0f 8d 3f fe ff ff 48 63 d0 48 8b 33 48 03 34 d5 60 c4 ab 81 8b 56 18 83 e2 01 74 0a f3 90 <8b> 4e 18 83 e1 01 75 f6 83 f8 ff 48 8b 7b 08 74 2a 48 63 35 70
----------