Re: Oops on aoe module removal

From: Ed Cashin
Date: Thu Jan 03 2013 - 14:39:02 EST


Lines: 75

On Thu, Jan 03, 2013 at 12:15:35PM -0600, Ed Cashin wrote:
...
> >>>>> On Jan 3, 2013, at 8:25 AM, Josh Boyer wrote:
...
> >>>>>> [699170.611997] aoe: AoE v47 initialised.
...
> >>>>>> [699231.308319] WARNING: at lib/list_debug.c:62 __list_del_entry+0x82/0xd0()
> >>>>>> [699231.312031] Hardware name: S5000VSA
> >>>>>> [699231.315658] list_del corruption. next->prev should be ffff880009fa37e8, but was ffffffff81c79c00
> >>>>>> [699231.319352] Modules linked in: aoe(-) ip6table_filter ip6_tables ebtable_nat ebtables lockd sunrpc bridge 8021q garp stp llc vfat fat binfmt_misc iTCO_wdt iTCO_vendor_support vhost_net lpc_ich radeon tun macvtap mfd_core serio_raw coretemp i2c_algo_bit ttm i5000_edac macvlan drm_kms_helper e1000e edac_core microcode i5k_amb shpchp i2c_i801 drm kvm_intel i2c_core kvm ioatdma dca raid1
> >>>>>> [699231.336259] Pid: 8584, comm: modprobe Not tainted 3.6.11-1.fc17.x86_64 #1
> >>>>>> [699231.340561] Call Trace:
> >>>>>> [699231.344865] [<ffffffff8105c8ef>] warn_slowpath_common+0x7f/0xc0
> >>>>>> [699231.349212] [<ffffffff8105c9e6>] warn_slowpath_fmt+0x46/0x50
> >>>>>> [699231.353595] [<ffffffff812eee52>] __list_del_entry+0x82/0xd0
> >>>>>> [699231.357954] [<ffffffff812eeeb1>] list_del+0x11/0x40
> >>>>>> [699231.362319] [<ffffffff812f6458>] percpu_counter_destroy+0x28/0x50
> >>>>>> [699231.366712] [<ffffffff8114c513>] bdi_destroy+0x43/0x140
> >>>>>> [699231.371127] [<ffffffff812be20c>] blk_release_queue+0x8c/0xc0
> >>>>>> [699231.375454] [<ffffffff812dc322>] kobject_cleanup+0x82/0x1b0
> >>>>>> [699231.379675] [<ffffffff812dc1ab>] kobject_put+0x2b/0x60
> >>>>>> [699231.383851] [<ffffffff812b80a5>] blk_put_queue+0x15/0x20
> >>>>>> [699231.387899] [<ffffffff812bc659>] blk_cleanup_queue+0xc9/0xe0
> >>>>>> [699231.391794] [<ffffffffa01f53f5>] aoedev_freedev+0x135/0x150 [aoe]
> >>>>>> [699231.395668] [<ffffffffa01f59a5>] aoedev_exit+0x65/0x80 [aoe]
> >>>>>> [699231.399493] [<ffffffffa01f5afe>] aoe_exit+0x2e/0x40 [aoe]
> >>>>>> [699231.403273] [<ffffffff810bdefe>] sys_delete_module+0x16e/0x2d0
> >>>>>> [699231.407119] [<ffffffff8161db56>] ? __schedule+0x3c6/0x7a0
> >>>>>> [699231.411050] [<ffffffff8119054a>] ? sys_write+0x4a/0x90
> >>>>>> [699231.415033] [<ffffffff81627329>] system_call_fastpath+0x16/0x1b
> >>>>>> [699231.419117] ---[ end trace 9e1558af1964b569 ]---
> >>>>>> [699231.423248] ------------[ cut here ]------------

The blk_alloc_queue has already done a bdi_init, so do not bdi_init again in
aoeblk_gdalloc.

The patch below applies to v3.5.6, with its v47 aoe driver. On my system it
eliminates the list_del corruption messages.

It updates VERSION for convenience during testing.

diff --git a/drivers/block/aoe/aoe.h b/drivers/block/aoe/aoe.h
index db195ab..2ccb9e2 100644
--- a/drivers/block/aoe/aoe.h
+++ b/drivers/block/aoe/aoe.h
@@ -1,5 +1,5 @@
/* Copyright (c) 2007 Coraid, Inc. See COPYING for GPL terms. */
-#define VERSION "47"
+#define VERSION "47nobdi1"
#define AOE_MAJOR 152
#define DEVICE_NAME "aoe"

diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index 321de7b..7eca463 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -276,8 +276,6 @@ aoeblk_gdalloc(void *vp)
goto err_mempool;
blk_queue_make_request(d->blkq, aoeblk_make_request);
d->blkq->backing_dev_info.name = "aoe";
- if (bdi_init(&d->blkq->backing_dev_info))
- goto err_blkq;
spin_lock_irqsave(&d->lock, flags);
gd->major = AOE_MAJOR;
gd->first_minor = d->sysminor * AOE_PARTITIONS;
@@ -298,9 +296,6 @@ aoeblk_gdalloc(void *vp)
aoedisk_add_sysfs(d);
return;

-err_blkq:
- blk_cleanup_queue(d->blkq);
- d->blkq = NULL;
err_mempool:
mempool_destroy(d->bufpool);
err_disk:

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/