crash in 3.12.51 (likely in 3.12.52 as well) in timer code

From: Nikolay Borisov
Date: Wed Feb 03 2016 - 05:59:23 EST


Hello,

I've observed the following crash on a machine running 3.12.51:

[2711471.041886] Modules linked in: xt_length xt_state xt_pkttype xt_dscp xt_multiport xt_set(O) ip_set_list_set(O) ip_set_hash_ip(O) ip_set(O) act_police cls_basic sch_ingress veth dm_snapshot netconsole openvswitch gre vxlan ip_tunnel nf_nat_ftp nf_conntrack_ftp xt_owner xt_conntrack iptable_mangle xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_CT nf_conntrack iptable_raw ip6table_filter ip6_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 ext2 dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio dm_mirror dm_region_hash dm_log ipmi_devintf ipmi_si ipmi_msghandler i2c_i801 lpc_ich mfd_core shpchp ioapic ioatdma ses enclosure ixgbe dca
[2711471.059208] CPU: 12 PID: 0 Comm: swapper/12 Tainted: G O 3.12.51-clouder5 #2
[2711471.059563] Hardware name: Supermicro PIO-628U-TR4T+-ST031/X10DRU-i+, BIOS 1.0c 03/23/2015
[2711471.059919] task: ffff881fd31db870 ti: ffff881fd31ea000 task.ti: ffff881fd31ea000
[2711471.060273] RIP: 0010:[<ffffffff81097718>] [<ffffffff81097718>] detach_if_pending+0x48/0x100
[2711471.060972] RSP: 0018:ffff883fff203bd0 EFLAGS: 00010002
[2711471.061320] RAX: dead000000200200 RBX: ffffffffa018be20 RCX: 0000000000000074
[2711471.061672] RDX: ffff883fd2e14638 RSI: ffff883fd2df8000 RDI: ffffffffa018be20
[2711471.062025] RBP: ffff883fff203bf0 R08: 0000000000000000 R09: ffff881fff403700
[2711471.062377] R10: ffffea00d4178f80 R11: 0000000000000000 R12: ffff883fd2df8000
[2711471.062729] R13: 0000000000000000 R14: 0000000000000001 R15: ffff883fff203c88
[2711471.063081] FS: 0000000000000000(0000) GS:ffff883fff200000(0000) knlGS:0000000000000000
[2711471.063437] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2711471.063787] CR2: 00007f7b9b4b2000 CR3: 000000380ddae000 CR4: 00000000001407e0
[2711471.064143] Stack:
[2711471.064483] ffffffffa018be20 ffff883fd2df8000 0000000000000000 0000000000000000
[2711471.065090] ffff883fff203c30 ffffffff810978f1 0000000000000000 0000000000000082
[2711471.065695] ffff883fff203c88 ffffffffa018be00 ffff883fff203c88 0000000000000001
[2711471.066301] Call Trace:
[2711471.066643] <IRQ>
[2711471.066709]
[2711471.067112] [<ffffffff810978f1>] del_timer+0x41/0x70
[2711471.067465] [<ffffffff810a5271>] try_to_grab_pending+0x121/0x1d0
[2711471.067818] [<ffffffff810a5532>] mod_delayed_work_on+0x42/0xa0
[2711471.068171] [<ffffffffa018b1fa>] set_timeout+0x3a/0x40 [ib_addr]
[2711471.068523] [<ffffffffa018b22d>] netevent_callback+0x2d/0x40 [ib_addr]
[2711471.068879] [<ffffffff810b45c4>] notifier_call_chain+0x54/0x80
[2711471.069231] [<ffffffff810b461a>] __atomic_notifier_call_chain+0x2a/0x40
[2711471.069584] [<ffffffff810b4646>] atomic_notifier_call_chain+0x16/0x20
[2711471.069940] [<ffffffff81590f8b>] call_netevent_notifiers+0x1b/0x20
[2711471.070292] [<ffffffff81593a2e>] neigh_update_notify+0x1e/0x40
[2711471.070643] [<ffffffff815941c6>] neigh_timer_handler+0x116/0x270
[2711471.070995] [<ffffffff815940b0>] ? neigh_periodic_work+0x270/0x270
[2711471.071346] [<ffffffff810975b9>] call_timer_fn+0x49/0x160
[2711471.079597] [<ffffffff81098298>] run_timer_softirq+0x278/0x2e0
[2711471.079948] [<ffffffff815940b0>] ? neigh_periodic_work+0x270/0x270
[2711471.080301] [<ffffffff8108f037>] __do_softirq+0x137/0x2e0
[2711471.080653] [<ffffffff8164c54c>] call_softirq+0x1c/0x30
[2711471.081006] [<ffffffff8104a35d>] do_softirq+0x8d/0xc0
[2711471.081356] [<ffffffff8108ebd5>] irq_exit+0x95/0xa0
[2711471.081706] [<ffffffff8164cc8a>] smp_apic_timer_interrupt+0x4a/0x5a
[2711471.082057] [<ffffffff8164b92f>] apic_timer_interrupt+0x6f/0x80
[2711471.082406] <EOI>
[2711471.082472]
[2711471.082877] [<ffffffff81051b53>] ? mwait_idle+0x73/0x90
[2711471.083227] [<ffffffff81051b4a>] ? mwait_idle+0x6a/0x90
[2711471.083577] [<ffffffff81051bc6>] arch_cpu_idle+0x26/0x30
[2711471.083929] [<ffffffff810d28db>] cpu_startup_entry+0xcb/0x2a0
[2711471.084283] [<ffffffff81071369>] start_secondary+0x1e9/0x250
[2711471.084633] Code: 44 00 00 31 c0 41 89 d6 48 89 fb 48 8b 17 49 89 f4 48 85 d2 74 4a 8b 05 ff ad c0 00 85 c0 7f 6c 48 8b 43 08 45 84 f6 48 89 42 08 <48> 89 10 74 07 48 c7 03 00 00 00 00 48 b9 00 02 20 00 00 00 ad
[2711471.089662] RIP [<ffffffff81097718>] detach_if_pending+0x48/0x100
[2711471.090078] RSP <ffff883fff203bd0>


Analysing the issue it seems what happens is that a neighbor timer
expires which in turn causes the subscribed ib_addr module to invoke
set_timeout which queues delayed work. However, it seems something has
already corrupted the timer_list since the crash actually occurs in the
inlined detach_timer inside detach_if_pending, here is annotated assembly:

------------[detach_timer]----------------------
/home/projects/linux-stable/kernel/timer.c: 662
0xffffffff8109770d <detach_if_pending+61>: mov rax,QWORD PTR [rbx+0x8] ; rbx holds value of rdi = timer_list
/home/projects/linux-stable/kernel/timer.c: 663
0xffffffff81097711 <detach_if_pending+65>: test r14b,r14b
----------[__list_del]----------------------
/home/projects/linux-stable/include/linux/list.h: 88
0xffffffff81097714 <detach_if_pending+68>: mov QWORD PTR [rdx+0x8],rax ; ffffffffa018be20
/home/projects/linux-stable/include/linux/list.h: 89
0xffffffff81097718 <detach_if_pending+72>: mov QWORD PTR [rax],rdx
---------------[__list_del]----------------
/home/projects/linux-stable/kernel/timer.c: 663
0xffffffff8109771b <detach_if_pending+75>: je 0xffffffff81097724 <detach_if_pending+84>
/home/projects/linux-stable/kernel/timer.c: 664
0xffffffff8109771d <detach_if_pending+77>: mov QWORD PTR [rbx],0x0
/home/projects/linux-stable/kernel/timer.c: 665
0xffffffff81097724 <detach_if_pending+84>: movabs rcx,0xdead000000200200
------------[end detach_timer]-------------


It seems when the code tries to do prev->next = next in __list_del from detach_timer,
rax has a value of dead000000200200 (LIST_POISON2).

ffffffffa018be20 is the address of the timer_list passed to detach_timer which looks
like so:

crash> struct timer_list ffffffffa018be20
struct timer_list {
entry = {
next = 0xffff883fd2e14638,
prev = 0xffff883fff223e60
},
expires = 4565976929,
base = 0xffff883fd2e14002,
function = 0xffffffff810a4f70 <delayed_work_timer_fn>,
data = 18446744072100560384,
slack = -1
}

So in this case the prev/next entries do not look like corrupted, whereas
when manipulating the list inside detach_timer they do. This is really
odd, any ideas how to further debug this?

Regards,
Nikolay