Crash in __wake_up_common
From: Nikolay Borisov
Date: Tue Apr 19 2016 - 10:37:59 EST
Hello,
On a 4.4.1 kernel I observed the following crash:
[1157738.189104] BUG: unable to handle kernel NULL pointer dereference at (null)
[1157738.189374] IP: [<ffffffff810e08be>] __wake_up_common+0x2e/0x90
[1157738.189596] PGD 4382a6067 PUD 43827e067 PMD 0
[1157738.189901] Oops: 0000 [#1] SMP
[1157738.190158] Modules linked in: tcp_scalable dm_snapshot dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio xt_multiport xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables zfs(PO) zcommon(PO) znvpair(PO) spl(O) zavl(PO) zunicode(PO) ext2 ib_umad ib_ipoib ib_cm ib_sa ses enclosure igb i2c_algo_bit x86_pkg_temp_thermal crc32_pclmul sb_edac edac_core i2c_i801 lpc_ich mfd_core ioatdma shpchp ipmi_devintf ipmi_si ipmi_msghandler ib_qib dca ib_mad ib_core ib_addr ipv6
[1157738.193517] CPU: 2 PID: 11460 Comm: z_wr_iss Tainted: P W O 4.4.1-clouder2 #69
[1157738.193688] Hardware name: Supermicro X9DRD-iF/LF/X9DRD-iF, BIOS 3.0b 12/05/2013
[1157738.193859] task: ffff8802d102a700 ti: ffff88005b068000 task.ti: ffff88005b068000
[1157738.194029] RIP: 0010:[<ffffffff810e08be>] [<ffffffff810e08be>] __wake_up_common+0x2e/0x90
[1157738.194247] RSP: 0018:ffff88005b06bd48 EFLAGS: 00010096
[1157738.194415] RAX: ffffffffffffffe8 RBX: ffff880438ef52c8 RCX: 0000000000000000
[1157738.194585] RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff880438ef52c8
[1157738.194756] RBP: ffff88005b06bd88 R08: 0000000000000000 R09: 0000000000000002
[1157738.194926] R10: 0000000000000001 R11: 0000000000000078 R12: 0000000000000086
[1157738.195098] R13: ffff880438ef52d0 R14: 0000000000000000 R15: 0000000000000000
[1157738.195267] FS: 0000000000000000(0000) GS:ffff88047fc40000(0000) knlGS:0000000000000000
[1157738.195440] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1157738.195607] CR2: 0000000000000000 CR3: 00000004382a9000 CR4: 00000000001406e0
[1157738.195778] Stack:
[1157738.195939] ffff880438ef52c8 0000000300000001 0000000000000001 ffff880438ef52c8
[1157738.196294] 0000000000000086 0000000000000003 0000000000000001 0000000000000000
[1157738.196648] ffff88005b06bdc8 ffffffff810e0ee8 ffffffffa0296000 ffff880438ef5200
[1157738.197002] Call Trace:
[1157738.197168] [<ffffffff810e0ee8>] __wake_up+0x48/0x70
[1157738.197343] [<ffffffffa0296000>] ? taskq_thread_spawn+0x60/0x60 [spl]
[1157738.197515] [<ffffffffa0296000>] ? taskq_thread_spawn+0x60/0x60 [spl]
[1157738.197687] [<ffffffffa0296106>] taskq_thread+0x106/0x580 [spl]
[1157738.197857] [<ffffffff810cb5c0>] ? try_to_wake_up+0x3b0/0x3b0
[1157738.198028] [<ffffffffa0296000>] ? taskq_thread_spawn+0x60/0x60 [spl]
[1157738.198199] [<ffffffffa0296000>] ? taskq_thread_spawn+0x60/0x60 [spl]
[1157738.198370] [<ffffffffa0296000>] ? taskq_thread_spawn+0x60/0x60 [spl]
[1157738.198541] [<ffffffff810c1777>] kthread+0xd7/0xf0
[1157738.198711] [<ffffffff810ca3ee>] ? schedule_tail+0x1e/0xd0
[1157738.198880] [<ffffffff810c16a0>] ? kthread_freezable_should_stop+0x80/0x80
[1157738.199053] [<ffffffff8167de2f>] ret_from_fork+0x3f/0x70
[1157738.199222] [<ffffffff810c16a0>] ? kthread_freezable_should_stop+0x80/0x80
[1157738.199393] Code: e5 41 57 41 56 41 55 41 54 53 48 83 ec 18 0f 1f 44 00 00 89 75 cc 89 55 c8 4c 8d 6f 08 48 8b 57 08 41 89 cf 48 8d 42 e8 4d 89 c6 <48> 8b 58 18 49 39 d5 74 3b 48 83 eb 18 eb 07 48 89 d8 48 8d 5a
[1157738.202805] RIP [<ffffffff810e08be>] __wake_up_common+0x2e/0x90
[1157738.203020] RSP <ffff88005b06bd48>
[1157738.203184] CR2: 0000000000000000
ffffffff810e08be points to this line in __wake_up_common:
list_for_each_entry_safe(curr, next, &q->task_list, task_list) {
This is the wait_queue_head_t:
crash> struct wait_queue_head_t ffff880438ef52c8
struct wait_queue_head_t {
lock = {
{
rlock = {
raw_lock = {
val = {
counter = 1
}
}
}
}
},
task_list = {
next = 0x0,
prev = 0xffff880438ef52d8
}
}
nr_exclusive seems to be 1, and mode is 3 (TASK_NORMAL).
The spl module is coming from zfs(ZoL) but I dunno whether this might
be a bug in the scheduler or in the zfs. The line which led to the
__wake_up is this:
wake_up(&tq->tq_wait_waitq);