Re: [BUG 2.6.17-git] kmem_cache_create: duplicate cache scsi_cmd_cache

From: Erik Mouw
Date: Fri May 12 2006 - 13:16:02 EST


On Fri, May 12, 2006 at 06:53:27AM +0200, Or Gerlitz wrote:
> On 5/11/06, Erik Mouw <erik@xxxxxxxxxxxxxxxxxxxxx> wrote:
> >Hi,
> >
> >While playing with libata in 2.6.17-git from today, I got this bug:
> >
> >SCSI subsystem initialized
> >libata version 1.20 loaded.
> >sata_promise 0000:02:05.0: version 1.04
> >kmem_cache_create: duplicate cache scsi_cmd_cache
> > <c0159591> kmem_cache_create+0x331/0x390
> > <e0924371> scsi_setup_command_freelist+0x71/0xf0 [scsi_mod]
> > <e092588e> scsi_host_alloc+0x17e/0x270 [scsi_mod]
> > <e08fd061> ata_host_add+0x41/0xc0 [libata]
> > <c0148c9f> __kzalloc+0x1f/0x50
> > <e08fd190> ata_device_add+0xb0/0x240 [libata]
> > <e0839baf> pdc_ata_init_one+0x27f/0x330 [sata_promise]
> > <c01dd1a9> pci_call_probe+0x19/0x20
> > <c01dd20e> __pci_device_probe+0x5e/0x70
> > <c01dd24f> pci_device_probe+0x2f/0x50
> > <c0220977> driver_probe_device+0xb7/0xe0
> > <c02b338a> klist_dec_and_del+0x1a/0x20
> > <c0220a30> __driver_attach+0x0/0x90
> > <c0220aa1>__driver_attach+0x71/0x90
> > <c021fd49> bus_for_each_dev+0x69/0x80
> > <c0220ae6> driver_attach+0x26/0x30
> > <c0220a30> __driver_attach+0x0/0x90
> > <c02202a3> bus_add_driver+0x83/0xc0
> > <c01dd4ed> __pci_register_driver+0x4d/0x70
> > <e0880017> pdc_ata_init+0x17/0x1b [sata_promise]
> > <c013a1a0> sys_init_module+0x120/0x1b0
> > <c0102f27> syscall_call+0x7/0xb
>
> I was pointing on 2.6.17 kmem_cache related issues while back to LKML
> and it turns out that you can reproduce them easily with standalone
> trivial module, see
> http://openib.org/pipermail/openib-general/2006-April/020582.html
>
> So far no one picked the issue anb i was adviced to use bisection...

I tracked it down with git bisect. The culprit is this commit:

56cf6504fc1c0c221b82cebc16a444b684140fb7 is first bad commit
diff-tree 56cf6504fc1c0c221b82cebc16a444b684140fb7 (from
d98550e334715b2d9e45f8f0f4e1608720108640)
Author: Russell King <rmk@xxxxxxxxxxxxxxxxxxxxxxx>
Date: Fri May 5 17:57:52 2006 +0100

[BLOCK] Fix oops on removal of SD/MMC card

The block layer keeps a reference (driverfs_dev) to the struct
device associated with the block device, and uses it internally
for generating uevents in block_uevent.

Block device uevents include umounting the partition, which can
occur after the backing device has been removed.

Unfortunately, this reference is not counted. This means that
if the struct device is removed from the device tree, the block
layers reference will become stale.

Guard against this by holding a reference to the struct device
in add_disk(), and only drop the reference when we're releasing
the gendisk kobject - in other words when we can be sure that no
further uevents will be generated for this block device.

Signed-off-by: Russell King <rmk+kernel@xxxxxxxxxxxxxxxx>
Acked-by: Jens Axboe <axboe@xxxxxxx>

:040000 040000 4923c988a57db93382546cd83f4e3043b06c0eed 08e396a4fbf3c7d0beb9178f0fd0c9205c0b5305 M block

After reverting this commit in 2.6.17-rc4 I can't trigger the bug
anymore. Might be worth fixing before 2.6.17-final.

Note: I'm going on holiday next week, so I'm not able to test any
fixes. However, because this bug is very easy to trigger[1], anybody
with root on NFS and fully modular SCSI or libata should be able to
test.


Erik

[1] See http://marc.theaimsgroup.com/?l=linux-scsi&m=114736053211943&w=2

--
+-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 --
| Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/