Re: Reporting a bug - Memory corruption in Linux kernel

From: Theodore Ts'o
Date: Fri Mar 07 2014 - 16:32:26 EST


On Sat, Mar 08, 2014 at 01:48:42AM +0530, Nilesh More wrote:
>
> 1. When the USB is hotplugged, in the call stack of add_disk( ),
> while registering disk blkdev_get(bdev, FMODE_READ, NULL) gets called
> which I guess scans the partition table, initializes part array and
> registers the partitions in the driver model.
>
> 2. To release the ownership of bdev obtained in step#1,
> blkdev_put(bdev, FMODE_READ) is called. This invalidates the pages
> cached for bdev in above blkdev_get call by first doing a writeback of
> these pages to disk.
>
> 3. Now if I prevent the invalidate page call in step# 2, I see that
> ext4 file system remains intact without any correction. That suggests,
> some part of cached pages obtained in step#1 blkdev_get call is
> already being used by ext4 file system and once these pages are
> invalidated we have a corruption in ext4 file system.

Can you put in a WARN_ON(1) in blkdev_put() and blkdev_get(), so we
can see the exact call stack? Also, can you print out the value of
the bdev->bd_dev and bdev->bd_openers at the beginning of blkdev_put()
and blkdev_get()?

I am not convinced that your analysis is correct, given the "USB
disconnect" message. So let's see the exact call stack for the calls
to blkdev_get() and blkdev_put(), and see exactly which device is
getting obtained and released.

> My query now is, has anybody seen similar kind of issue before ? Could
> this be a known bug ?

Nothing like this before, no. Note that the invalidate_pages() in
blkdev_put() only happens when bdev->bd_openers drops down to zero.
If the file system is mounted, then bd_openers will be one. So even
if someone is calling blkdev_get() and blkdev_put() on the file
system, bd_openers will not drop to zero.

Also, the USB device would be a different bdev than the one for the
system disk. So your theory simply doesn't make any sense to me. If
you think that is really what's going on, let's put in the debugging
printk's that show exactly which device and the bd_openers count for
each call to blkdev_put() and blkdev_get(), and then let's get the
precise stack trace used when the pages get invalidated.

This pattern:

[ 413.607849] usb 2-1.1: USB disconnect, device number 12
[ 414.022630] EXT4-fs error (device mmcblk0p20): ext4_readdir:227: inode #81827: block 328308: comm installd...

is the normal thing that one would expect if someone yanks the USB
device or a SD card containing a mounted file system from the system.
Any theory of what's going that doesn't account for the "USB
disconnect" message is going to be fundamentally incomplete.


Cheers,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/