padata & workqueue list corruption

From: Samuel Holland
Date: Tue Apr 18 2017 - 23:08:59 EST


Hello Steffen & Workqueue People,

As Jason wrote about here a few weeks ago, we've been having issues
with padata. After spending considerable time working to rule out
the possibility that our code was doing something wrong, I've begun
to debug padata and the workqueue subsystems. I've gotten some
potentially useful backtraces and was hoping somebody here might
read them and have an "ah ha!" moment.

We've been using the padata library for some high-throughput
encryption/decryption workloads, and on a relatively weak CPU (Celeron
J1800), we have run into list corruption that results in all workqueues
getting stalled, and occasional panics. I can reproduce this fairly
easily, albeit after several hours of load.

Representative backtraces follow (the warnings come in sets). I have
kernel .configs and extended netconsole output from several occurrences
available upon request.

WARNING: CPU: 1 PID: 0 at lib/list_debug.c:33 __list_add+0x89/0xb0
list_add corruption. prev->next should be next (ffff99f135016a90), but
was ffffd34affc03b10. (prev=ffffd34affc03b10).
CPU: 1 PID: 0 Comm: swapper/1 Tainted: G O 4.9.20+ #1
Call Trace:
<IRQ>
dump_stack+0x67/0x92
__warn+0xc6/0xe0
warn_slowpath_fmt+0x5a/0x80
__list_add+0x89/0xb0
insert_work+0x3c/0xc0
__queue_work+0x18a/0x600
queue_work_on+0x33/0x70
padata_do_parallel+0x14f/0x240
? padata_index_to_cpu+0x50/0x50
? packet_receive+0x140/0x140 [wireguard]
packet_consume_data+0x1b9/0x2b0 [wireguard]
? packet_create_data+0x6b0/0x6b0 [wireguard]
? get_partial_node.isra.72+0x47/0x250
? skb_prepare_header+0xd5/0x3f0 [wireguard]
? packet_receive+0x140/0x140 [wireguard]
packet_receive+0x79/0x140 [wireguard]
? __udp4_lib_lookup+0x147/0x2d0
receive+0x1a/0x30 [wireguard]
udp_queue_rcv_skb+0x34a/0x5b0
__udp4_lib_rcv+0x468/0xb40
? ip_local_deliver_finish+0x21/0x370
udp_rcv+0x15/0x20
ip_local_deliver_finish+0xb7/0x370
? ip_local_deliver_finish+0x21/0x370
ip_local_deliver+0x1e6/0x230
? ip_local_deliver+0x62/0x230
? ip_rcv_finish+0x670/0x670
ip_rcv_finish+0x1ae/0x670
ip_rcv+0x366/0x4d0
? ip_rcv+0x26a/0x4d0
? inet_del_offload+0x40/0x40
__netif_receive_skb_core+0xae1/0xc80
? inet_del_offload+0x40/0x40
? netif_receive_skb_internal+0x29/0x200
__netif_receive_skb+0x18/0x60
netif_receive_skb_internal+0x7b/0x200
? netif_receive_skb_internal+0x29/0x200
netif_receive_skb+0xcd/0x130
br_pass_frame_up+0x2b1/0x2c0
? br_pass_frame_up+0xad/0x2c0
? br_allowed_ingress+0x38a/0x5d0
? br_allowed_ingress+0x1f5/0x5d0
br_handle_frame_finish+0x28f/0x5a0
? br_handle_frame+0x1c1/0x5e0
br_handle_frame+0x2c5/0x5e0
? br_handle_frame+0x1c1/0x5e0
? vlan_do_receive+0x37/0x380
? br_handle_frame_finish+0x5a0/0x5a0
__netif_receive_skb_core+0x1e6/0xc80
? netif_receive_skb_internal+0x29/0x200
__netif_receive_skb+0x18/0x60
netif_receive_skb_internal+0x7b/0x200
? netif_receive_skb_internal+0x29/0x200
napi_gro_receive+0x148/0x200
igb_poll+0x67b/0xdb0
? net_rx_action+0xe5/0x450
net_rx_action+0x224/0x450
__do_softirq+0x1a9/0x4a0
irq_exit+0xbe/0xd0
do_IRQ+0x65/0x110
common_interrupt+0x89/0x89
<EOI>

This add looks to be racing with a deletion of the last item in the
list, because in the actual list prev->next = prev.

WARNING: CPU: 1 PID: 0 at lib/list_debug.c:36 __list_add+0xac/0xb0
list_add double add: new=ffffd34affc03b10, prev=ffffd34affc03b10,
next=ffff99f135016a90.
CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W O 4.9.20+ #1
Call Trace:
<IRQ>
dump_stack+0x67/0x92
__warn+0xc6/0xe0
warn_slowpath_fmt+0x5a/0x80
__list_add+0xac/0xb0
insert_work+0x3c/0xc0
__queue_work+0x18a/0x600
queue_work_on+0x33/0x70
padata_do_parallel+0x14f/0x240
? padata_index_to_cpu+0x50/0x50
? packet_receive+0x140/0x140 [wireguard]
packet_consume_data+0x1b9/0x2b0 [wireguard]
? packet_create_data+0x6b0/0x6b0 [wireguard]
? get_partial_node.isra.72+0x47/0x250
? skb_prepare_header+0xd5/0x3f0 [wireguard]
? packet_receive+0x140/0x140 [wireguard]
packet_receive+0x79/0x140 [wireguard]
? __udp4_lib_lookup+0x147/0x2d0
receive+0x1a/0x30 [wireguard]
udp_queue_rcv_skb+0x34a/0x5b0
__udp4_lib_rcv+0x468/0xb40
? ip_local_deliver_finish+0x21/0x370
udp_rcv+0x15/0x20
ip_local_deliver_finish+0xb7/0x370
? ip_local_deliver_finish+0x21/0x370
ip_local_deliver+0x1e6/0x230
? ip_local_deliver+0x62/0x230
? ip_rcv_finish+0x670/0x670
ip_rcv_finish+0x1ae/0x670
ip_rcv+0x366/0x4d0
? ip_rcv+0x26a/0x4d0
? inet_del_offload+0x40/0x40
__netif_receive_skb_core+0xae1/0xc80
? inet_del_offload+0x40/0x40
? netif_receive_skb_internal+0x29/0x200
__netif_receive_skb+0x18/0x60
netif_receive_skb_internal+0x7b/0x200
? netif_receive_skb_internal+0x29/0x200
netif_receive_skb+0xcd/0x130
br_pass_frame_up+0x2b1/0x2c0
? br_pass_frame_up+0xad/0x2c0
? br_allowed_ingress+0x38a/0x5d0
? br_allowed_ingress+0x1f5/0x5d0
br_handle_frame_finish+0x28f/0x5a0
? br_handle_frame+0x1c1/0x5e0
br_handle_frame+0x2c5/0x5e0
? br_handle_frame+0x1c1/0x5e0
? vlan_do_receive+0x37/0x380
? br_handle_frame_finish+0x5a0/0x5a0
__netif_receive_skb_core+0x1e6/0xc80
? netif_receive_skb_internal+0x29/0x200
__netif_receive_skb+0x18/0x60
netif_receive_skb_internal+0x7b/0x200
? netif_receive_skb_internal+0x29/0x200
napi_gro_receive+0x148/0x200
igb_poll+0x67b/0xdb0
? net_rx_action+0xe5/0x450
net_rx_action+0x224/0x450
__do_softirq+0x1a9/0x4a0
irq_exit+0xbe/0xd0
do_IRQ+0x65/0x110
common_interrupt+0x89/0x89
<EOI>

And finally:

WARNING: CPU: 0 PID: 0 at lib/list_debug.c:33 __list_add+0x89/0xb0
list_add corruption. prev->next should be next (ffff99f135016a90), but
was ffffd34affc03b10. (prev=ffffd34affc03b10).
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W O 4.9.20+ #1
Call Trace:
<IRQ>
dump_stack+0x67/0x92
__warn+0xc6/0xe0
warn_slowpath_fmt+0x5a/0x80
__list_add+0x89/0xb0
insert_work+0x3c/0xc0
__queue_work+0x18a/0x600
? execute_in_process_context+0x90/0x90
delayed_work_timer_fn+0x19/0x20
call_timer_fn+0xa8/0x340
? process_timeout+0x10/0x10
? execute_in_process_context+0x90/0x90
run_timer_softirq+0x1bf/0x6b0
? ktime_get+0x91/0x120
? clockevents_program_event+0x42/0x120
? clockevents_program_event+0x7d/0x120
__do_softirq+0x1a9/0x4a0
irq_exit+0xbe/0xd0
smp_apic_timer_interrupt+0x3d/0x50
apic_timer_interrupt+0x89/0x90
<EOI>

Note that this third warning is on the other CPU. All three warnings are
immediately consecutive in the kernel logs (within 1 second).

Looking at the workqueue code and documentation, it should be fine to
call queue_work_on() on an already-queued work item, as padata is doing.

I've also triggered a related warning in the workqueue code a few times:

WARNING at kernel/workqueue.c:1440 __queue_work+0x1d6/0x450
CPU: 1 PID: 1013 Comm: kworker/1:2 Tainted: G W O 4.9.17 #1
Call Trace:
dump_stack+0x4d/0x65
__warn+0xc6/0xe0
warn_slowpath_null+0x18/0x20
__queue_work+0x1d6/0x450
queue_work_on+0x14/0x30
padata_reorder+0xb2/0x160
padata_do_serial+0x51/0x60
timers_uninit_peer+0x275/0xa20 [wireguard]
padata_parallel_worker+0x77/0xa0
process_one_work+0x154/0x410
worker_thread+0x46/0x470
? process_one_work+0x410/0x410
kthread+0xc5/0xe0
? kthread_park+0x60/0x60
? kthread_park+0x60/0x60
ret_from_fork+0x22/0x30

FWIW I'm also sometimes seeing similar list corruption in the padata
queues, usually in padata_reorder(), as well as with lists in the timer
subsystem relating to padata's reorder timer. Lockdep does not complain
about anything, and I've manually checked the locking all three
subsystems to no avail, so I suspect this may ultimately be a deeper
synchronization or memory corruption issue.

If you have any immediate ideas, I'd be very grateful. If not, we'll
keep debugging, of course.

Thanks for any insight,
Samuel