[PATCH] fix kerneloops generated by aoe driver
From: Masanori ITOH
Date: Wed May 19 2010 - 04:57:53 EST
Hello,
I got the attached kerneloops when I was setting up an AoE client using
linux-2.6.34 and Fedoora 12(x86_64).
It's a rare case indeed as explained below, but in case of CentOS 5.x it does
cause a panic.
The attached patch fixes the problem and is working fine for me.
[reproducing instruction]
1. setup IEEE 802.1Q vlan between an AoE (vblade) server and an AoE client.
2. Use older version vblade such as vblade-14-3.el5 of EPEL
which has a bug dropping 802.1Q VLAN packets.
3. modprobe aoe using aoe_iflist="VLAN interface" first.
4. Wait for a couple of minutes.
5. Start vblade server process.
By repeating 3. - 5. above, the problem can be reproduced easily.
[analysis]
In this situation, aoe driver allocates aoedev structures, but cannot get
responses from vblade because of the vblade bug above. Thus, request_queue_t
in aodev structure is left uninitialized (blkq is NULL).
When users try to remove aoe driver using rmmod, aoedev_freedev calls
blk_cleanup_queue using a NULL pointer(d->blkq), and thus kernel generates
an Oops, I think.
[patch]
Signed-off-by: Masanori Itoh <itoumsn@xxxxxxxxxxxxx>
---
diff -ru linux-2.6.34.orig/drivers/block/aoe/aoedev.c linux-2.6.34/drivers/block/aoe/aoedev.c
--- linux-2.6.34.orig/drivers/block/aoe/aoedev.c 2010-05-17 06:17:36.000000000 +0900
+++ linux-2.6.34/drivers/block/aoe/aoedev.c 2010-05-19 16:20:43.000000000 +0900
@@ -114,7 +114,8 @@
if (d->bufpool)
mempool_destroy(d->bufpool);
skbpoolfree(d);
- blk_cleanup_queue(d->blkq);
+ if (d->blkq)
+ blk_cleanup_queue(d->blkq);
kfree(d);
}
[kerneloops]
[root@ca-blsv2a1 ~]# cat /var/cache/abrt/kerneloops-1274254780-1/kerneloops
BUG: unable to handle kernel NULL pointer dereference at 00000000000000f8
IP: [<ffffffff81058c25>] lock_timer_base+0x16/0x52
PGD 81722e067 PUD 8033fe067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/kernel/mm/ksm/run
CPU 13
Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss aoe(-) ipt_MASQUERADE iptable_nat nf_nat autofs4 sunrpc bridge 8021q garp stp llc ipv6 cpufreq_ondemand acpi_cpufreq freq_table dm_round_robin dm_multipath kvm_intel kvm uinput lpfc scsi_transport_fc scsi_tgt ioatdma igb i2c_i801 i2c_core iTCO_wdt pcspkr dca iTCO_vendor_support shpchp megaraid_sas [last unloaded: microcode]
May 19 16:38:16 ca-blsv2a1 kernel: Pid: 10805, comm: rmmod Not tainted 2.6.34-vanilla1 #3 G7KBP/Express5800/B120a [N8400-085]
Pid: 10805, comm: rmmod Not tainted 2.6.34-vanilla1 #3 G7KBP/Express5800/B120a [N8400-085]
RIP: 0010:[<ffffffff81058c25>] [<ffffffff81058c25>] lock_timer_base+0x16/0x52
RSP: 0018:ffff880817be3d88 EFLAGS: 00010286
RAX: 00400000000040c1 RBX: 00000000000000d0 RCX: 00000000002a0023
RDX: ffff880802c0a6c0 RSI: ffff880817be3dc0 RDI: 00000000000000d0
RBP: ffff880817be3da8 R08: ffffffff81a2ca38 R09: ffff880817be3e48
R10: 0000000000000000 R11: ffff880817be3e38 R12: 00000000000000d0
R13: ffff880817be3dc0 R14: 0000000000000001 R15: ffff880800dc4120
FS: 00007fa2e880f700(0000) GS:ffff88000a1a0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000000000f8 CR3: 000000080b910000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rmmod (pid: 10805, threadinfo ffff880817be2000, task ffff880816f1dd80)
Stack:
00000000000000d0 0000000000000858 00000000ffffffff 0000000000000001
ffff880817be3de8 ffffffff81058c83 0000000000000000 ffff880800dc4118
ffff880817be3df8 00000000000000d0 0000000000000858 ffff880800dc40e0
Call Trace:
[<ffffffff81058c83>] try_to_del_timer_sync+0x22/0x89
[<ffffffff81058d03>] del_timer_sync+0x19/0x25
[<ffffffff811f2ea4>] blk_sync_queue+0x1d/0x39
[<ffffffff811f2edb>] blk_cleanup_queue+0x1b/0x4e
[<ffffffffa0291b59>] aoedev_freedev+0xed/0x104 [aoe]
[<ffffffffa0291f09>] aoedev_exit+0x5e/0x72 [aoe]
[<ffffffffa02921b0>] aoe_exit+0x33/0x3b [aoe]
[<ffffffff81079f66>] sys_delete_module+0x1d8/0x264
[<ffffffff8143c804>] ? do_page_fault+0x23c/0x269
[<ffffffff81095e0e>] ? audit_syscall_entry+0x11e/0x14a
[<ffffffff81009c32>] system_call_fastpath+0x16/0x1b
Code: 00 c9 48 98 c3 55 48 89 e5 0f 1f 44 00 00 e8 03 07 3e 00 c9 c3 55 48 89 e5 41 56 41 55 41 54 53 0f 1f 44 00 00 49 89 fc 49 89 f5 <4d> 8b 74 24 28 4c 89 f3 48 83 e3 fe 74 2a 48 89 df e8 70 06 3e
RIP [<ffffffff81058c25>] lock_timer_base+0x16/0x52
RSP <ffff880817be3d88>
CR2: 00000000000000f8
---[ end trace e6a698c606d1641b ]---
Regards,
Masanori
---
Masanori ITOH R&D Headquarter, NTT DATA CORPORATION
e-mail: itoumsn@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/