Re: workqueue: race in mod_delayed_work_on?

From: Konstantin Khlebnikov
Date: Tue May 10 2016 - 12:34:17 EST


On 10.05.2016 11:21, Konstantin Khlebnikov wrote:
I've got plenty warnings, bugs and oops around trivial use of mod_delayed_work in drivers/infiniband/core/addr.c

Looks like problem in mod_delayed_work_on was hidden because add_timer is equal to mod_timer
but Sasha accidentally backported 874bbfe600a660cba9c776b3957b1ce393151b76
(workqueue: make sure delayed work run in local cpu) into 3.18.25

I don't see reason why that commit could break delayed work,
most likely it highlighted some other problem.


For example:

WARN_ON_ONCE(timer_pending(timer)) in __queue_delayed_work
and
BUG_ON(timer_pending(timer) || !timer->function) in add_timer_on

<4>[471106.614256] ------------[ cut here ]------------
<4>[471106.614271] WARNING: CPU: 2 PID: 0 at kernel/workqueue.c:1428 __queue_delayed_work+0xdd/0x160()
<4>[471106.614274] Modules linked in: tcp_diag inet_diag bridge cls_cgroup sch_htb netconsole configfs 8021q mrp garp stp llc
intel_powerclamp coretemp kvm_intel kvm crc32_pclmul aesni_intel
ablk_helper cryptd lrw mgag200 psmouse ttm gf128mul serio_raw glue_helper aes_x86_64 drm_kms_helper drm gpio_ich microcode sysimgblt
sysfillrect syscopyarea lpc_ich i7core_edac ioatdma edac_c
ore mlx4_ib ib_sa ib_mad ib_core ib_addr mlx4_en mlx4_core vxlan udp_tunnel ip6_udp_tunnel tcp_htcp ahci igb libahci dca i2c_algo_bit
i2c_core ptp pps_core raid10 raid456 async_pq async_xor x
or async_memcpy async_raid6_recov raid6_pq async_tx raid1 raid0 multipath linear [last unloaded: ipmi_msghandler]
<4>[471106.614339] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.18.25-28 #1
<4>[471106.614344] Hardware name: Supermicro X8DTU/X8DTU, BIOS 2.1c 08/03/2012
<4>[471106.614346] 0000000000000594 ffff880627c43c88 ffffffff816d08e3 0000000000000007
<4>[471106.614348] 0000000000000000 ffff880627c43cc8 ffffffff8106bb2c ffff880627c43cd8
<4>[471106.614350] 0000000000000080 ffff88061b0b2a00 0000000000000080 ffff88061b0b2a00
<4>[471106.614353] Call Trace:
<4>[471106.614354] <IRQ> [<ffffffff816d08e3>] dump_stack+0x4e/0x68
<4>[471106.614362] [<ffffffff8106bb2c>] warn_slowpath_common+0x8c/0xc0
<4>[471106.614364] [<ffffffff8106bb7a>] warn_slowpath_null+0x1a/0x20
<4>[471106.614367] [<ffffffff8108235d>] __queue_delayed_work+0xdd/0x160
<4>[471106.614371] [<ffffffff810829da>] mod_delayed_work_on+0x5a/0x80
<4>[471106.614375] [<ffffffffa014028a>] set_timeout+0x3a/0x40 [ib_addr]
<4>[471106.614378] [<ffffffffa01407af>] netevent_callback+0x2f/0x40 [ib_addr]
<4>[471106.614382] [<ffffffff8108af7d>] notifier_call_chain+0x4d/0x70
<4>[471106.614384] [<ffffffff8108afba>] __atomic_notifier_call_chain+0x1a/0x30
<4>[471106.614387] [<ffffffff8108afe6>] atomic_notifier_call_chain+0x16/0x20
<4>[471106.614390] [<ffffffff815d75fb>] call_netevent_notifiers+0x1b/0x20
<4>[471106.614393] [<ffffffff815d9a7e>] neigh_update_notify+0x1e/0x40
<4>[471106.614395] [<ffffffff815daebf>] neigh_timer_handler+0xdf/0x2a0
<4>[471106.614397] [<ffffffff815dade0>] ? neigh_periodic_work+0x220/0x220
<4>[471106.614401] [<ffffffff810bf43a>] call_timer_fn+0x3a/0x170
<4>[471106.614403] [<ffffffff810bf67c>] ? cascade+0x7c/0xa0
<4>[471106.614405] [<ffffffff810c100f>] run_timer_softirq+0x2df/0x340
<4>[471106.614409] [<ffffffff810ce944>] ? clockevents_program_event+0x74/0x100
<4>[471106.614411] [<ffffffff815dade0>] ? neigh_periodic_work+0x220/0x220
<4>[471106.614414] [<ffffffff8106f7df>] __do_softirq+0xdf/0x2f0
<4>[471106.614417] [<ffffffff8106fc66>] irq_exit+0x86/0xb0
<4>[471106.614420] [<ffffffff816db61a>] smp_apic_timer_interrupt+0x4a/0x60
<4>[471106.614423] [<ffffffff816d96fd>] apic_timer_interrupt+0x6d/0x80
<4>[471106.614424] <EOI> [<ffffffff815796a1>] ? cpuidle_enter_state+0x71/0x1d0
<4>[471106.614430] [<ffffffff8157968f>] ? cpuidle_enter_state+0x5f/0x1d0
<4>[471106.614433] [<ffffffff815798c7>] cpuidle_enter+0x17/0x20
<4>[471106.614436] [<ffffffff810a905f>] cpu_startup_entry+0x34f/0x3d0
<4>[471106.614438] [<ffffffff810461fc>] start_secondary+0x14c/0x160
<4>[471106.614440] ---[ end trace 23e6f46f3ce04a60 ]---
<5>[471106.614443] ---[ now 2016-04-27 06:06:02+03 ]---
<4>[471106.617880] ------------[ cut here ]------------
<2>[471106.617905] kernel BUG at kernel/time/timer.c:963!
<4>[471106.617924] invalid opcode: 0000 [#1] SMP
<4>[471106.617942] Modules linked in: tcp_diag inet_diag bridge cls_cgroup sch_htb netconsole configfs 8021q mrp garp stp llc
intel_powerclamp coretemp kvm_intel kvm crc32_pclmul aesni_intel ablk_helper cryptd lrw mgag200 psmouse ttm gf128mul serio_raw glue_helper
aes_x86_64 drm_kms_helper drm gpio_ich microcode sysimgblt sysfillrect syscopyarea lpc_ich i7core_edac ioatdma edac_core mlx4_ib ib_sa
ib_mad ib_core ib_addr mlx4_en mlx4_core vxlan udp_tunnel ip6_udp_tunnel tcp_htcp ahci igb libahci dca i2c_algo_bit i2c_core ptp pps_core
raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx raid1 raid0 multipath linear [last unloaded:
ipmi_msghandler]
<4>[471106.618265] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G W 3.18.25-28 #1
<4>[471106.618291] Hardware name: Supermicro X8DTU/X8DTU, BIOS 2.1c 08/03/2012
<4>[471106.618317] task: ffff88061e5a53c0 ti: ffff88061e5d0000 task.ti: ffff88061e5d0000
<4>[471106.618344] RIP: 0010:[<ffffffff810c09c9>] [<ffffffff810c09c9>] add_timer_on+0xf9/0x100
<4>[471106.618377] RSP: 0018:ffff880627c43c98 EFLAGS: 00010086
<4>[471106.618398] RAX: 000000000000d480 RBX: ffffffffa0143060 RCX: 0000000107028001
<4>[471106.618423] RDX: ffff880627c40000 RSI: 0000000000000002 RDI: ffffffffa0143060
<4>[471106.618449] RBP: ffff880627c43cd8 R08: ffffffff81389c30 R09: ffff880115355600
<4>[471106.618474] R10: 0000000000000216 R11: 0000000000000003 R12: ffff88061e0d0000
<4>[471106.618500] R13: 0000000000000080 R14: ffff88061b0b2a00 R15: 0000000000000001
<4>[471106.618526] FS: 0000000000000000(0000) GS:ffff880627c40000(0000) knlGS:0000000000000000
<4>[471106.618555] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
<4>[471106.618576] CR2: 00007f418817a3d0 CR3: 0000000001c14000 CR4: 00000000000007e0
<4>[471106.618602] Stack:
<4>[471106.618611] ffff880627c43cd8 0000000000000080 ffff88061b0b2a00 0000000000000002
<4>[471106.618643] ffff88061b0b2a00 0000000000000080 ffff88061b0b2a00 0000000000000001
<4>[471106.618675] ffff880627c43d08 ffffffff810822f6 0000000000000001 ffffffffa0143040
<4>[471106.618707] Call Trace:
<4>[471106.618718] <IRQ>
<4>[471106.618727] <4>[471106.618736] [<ffffffff810822f6>] __queue_delayed_work+0x76/0x160
<4>[471106.618758] [<ffffffff810829da>] mod_delayed_work_on+0x5a/0x80
<4>[471106.618782] [<ffffffffa014028a>] set_timeout+0x3a/0x40 [ib_addr]
<4>[471106.618807] [<ffffffffa01407af>] netevent_callback+0x2f/0x40 [ib_addr]
<4>[471106.618832] [<ffffffff8108af7d>] notifier_call_chain+0x4d/0x70
<4>[471106.618856] [<ffffffff8108afba>] __atomic_notifier_call_chain+0x1a/0x30
<4>[471106.618881] [<ffffffff8108afe6>] atomic_notifier_call_chain+0x16/0x20
<4>[471106.618906] [<ffffffff815d75fb>] call_netevent_notifiers+0x1b/0x20
<4>[471106.618930] [<ffffffff815d9a7e>] neigh_update_notify+0x1e/0x40
<4>[471106.618953] [<ffffffff815daebf>] neigh_timer_handler+0xdf/0x2a0
<4>[471106.618976] [<ffffffff815dade0>] ? neigh_periodic_work+0x220/0x220
<4>[471106.619000] [<ffffffff810bf43a>] call_timer_fn+0x3a/0x170
<4>[471106.619021] [<ffffffff810bf67c>] ? cascade+0x7c/0xa0
<4>[471106.619041] [<ffffffff810c100f>] run_timer_softirq+0x2df/0x340
<4>[471106.619064] [<ffffffff810ce944>] ? clockevents_program_event+0x74/0x100
<4>[471106.619090] [<ffffffff815dade0>] ? neigh_periodic_work+0x220/0x220
<4>[471106.619115] [<ffffffff8106f7df>] __do_softirq+0xdf/0x2f0
<4>[471106.619943] [<ffffffff8106fc66>] irq_exit+0x86/0xb0
<4>[471106.620758] [<ffffffff816db61a>] smp_apic_timer_interrupt+0x4a/0x60
<4>[471106.621578] [<ffffffff816d96fd>] apic_timer_interrupt+0x6d/0x80
<4>[471106.622386] <EOI>
<4>[471106.622396] <4>[471106.623190] [<ffffffff815796a1>] ? cpuidle_enter_state+0x71/0x1d0
<4>[471106.623997] [<ffffffff8157968f>] ? cpuidle_enter_state+0x5f/0x1d0
<4>[471106.624783] [<ffffffff815798c7>] cpuidle_enter+0x17/0x20
<4>[471106.625544] [<ffffffff810a905f>] cpu_startup_entry+0x34f/0x3d0
<4>[471106.626282] [<ffffffff810461fc>] start_secondary+0x14c/0x160
<4>[471106.626996] Code: fa 49 83 c5 10 48 89 de ff d1 48 8b 55 c8 4c 89 e8 4c 29 f0 48 8b 4c 02 f0 48 85 c9 75 dc 65 ff 0c 25 a0 b7 00 00
e9 79 ff ff ff <0f> 0b 0f 1f 44 00 00 66 66 66 66 90 55 48 89 e5 41 55 41 54 53
<1>[471106.628532] RIP [<ffffffff810c09c9>] add_timer_on+0xf9/0x100
<4>[471106.629244] RSP <ffff880627c43c98>
<5>[471106.629946] ---[ now 2016-04-27 06:06:02+03 ]---
<4>[471106.630695] ---[ end trace 23e6f46f3ce04a61 ]---

WARN_ON_ONCE(!work_pending(work)) in set_work_data

<4>[235668.529480] ------------[ cut here ]------------
<4>[235668.529491] WARNING: CPU: 3 PID: 0 at kernel/workqueue.c:597 __queue_work+0x2b6/0x3d0()
<4>[235668.529493] Modules linked in: ip6t_REJECT nf_reject_ipv6 xt_tcpudp ip6table_filter ip6_tables x_tables bridge cls_cgroup sch_htb
netconsole configfs 8021q mrp garp stp llc kvm_amd kvm
microcode amd64_edac_mod edac_core edac_mce_amd mgag200 psmouse ttm serio_raw sp5100_tco drm_kms_helper drm k10temp i2c_piix4 sysimgblt
sysfillrect syscopyarea mlx4_ib ib_sa ib_mad ib_core i
b_addr mlx4_en mlx4_core vxlan udp_tunnel ip6_udp_tunnel tcp_htcp raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov igb
dca i2c_algo_bit i2c_core ptp pps_core ahci libahci
raid6_pq async_tx raid1 raid0 multipath linear [last unloaded: ipmi_msghandler]
<4>[235668.529542] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G W 3.18.25-28 #1
<4>[235668.529544] Hardware name: Supermicro H8DGU/H8DGU, BIOS 1.3 07/12/11
<4>[235668.529546] 0000000000000255 ffff88040fcc3d78 ffffffff816d08e3 0000000000000007
<4>[235668.529548] 0000000000000000 ffff88040fcc3db8 ffffffff8106bb2c ffff88040fcc3da8
<4>[235668.529551] ffff8800df94ad00 ffffffffa00d7040 ffff8800df94ad60 ffff88040f811000
<4>[235668.529553] Call Trace:
<4>[235668.529555] <IRQ> [<ffffffff816d08e3>] dump_stack+0x4e/0x68
<4>[235668.529563] [<ffffffff8106bb2c>] warn_slowpath_common+0x8c/0xc0
<4>[235668.529567] [<ffffffff8106bb7a>] warn_slowpath_null+0x1a/0x20
<4>[235668.529569] [<ffffffff81082146>] __queue_work+0x2b6/0x3d0
<4>[235668.529572] [<ffffffff81082260>] ? __queue_work+0x3d0/0x3d0
<4>[235668.529575] [<ffffffff81082278>] delayed_work_timer_fn+0x18/0x20
<4>[235668.529579] [<ffffffff810bf43a>] call_timer_fn+0x3a/0x170
<4>[235668.529582] [<ffffffff810c0f34>] run_timer_softirq+0x204/0x340
<4>[235668.529586] [<ffffffff810ce944>] ? clockevents_program_event+0x74/0x100
<4>[235668.529588] [<ffffffff81082260>] ? __queue_work+0x3d0/0x3d0
<4>[235668.529591] [<ffffffff8106f7df>] __do_softirq+0xdf/0x2f0
<4>[235668.529593] [<ffffffff8106fc66>] irq_exit+0x86/0xb0
<4>[235668.529597] [<ffffffff816db61a>] smp_apic_timer_interrupt+0x4a/0x60
<4>[235668.529600] [<ffffffff816d96fd>] apic_timer_interrupt+0x6d/0x80
<4>[235668.529601] <EOI> [<ffffffff8101e0a9>] ? sched_clock+0x9/0x10
<4>[235668.529607] [<ffffffff81055246>] ? native_safe_halt+0x6/0x10
<4>[235668.529610] [<ffffffff8101f153>] default_idle+0x23/0x100
<4>[235668.529613] [<ffffffff8101faff>] arch_cpu_idle+0xf/0x20
<4>[235668.529616] [<ffffffff810a8fea>] cpu_startup_entry+0x2da/0x3d0
<4>[235668.529619] [<ffffffff810ce352>] ? clockevents_register_device+0xe2/0x140
<4>[235668.529622] [<ffffffff810461fc>] start_secondary+0x14c/0x160
<4>[235668.529624] ---[ end trace cbf3ac778a6035e6 ]---



--
Konstantin