Re: fs: GPF in locked_inode_to_wb_and_lock_list
From: Tejun Heo
Date: Thu Apr 21 2016 - 13:06:48 EST
Hello,
(cc'ing Ilya, Jan and Jens)
On Thu, Apr 21, 2016 at 12:00:38PM +0200, Dmitry Vyukov wrote:
> On Thu, Apr 21, 2016 at 11:45 AM, Andrey Ryabinin
> <ryabinin.a.a@xxxxxxxxx> wrote:
> > 2016-04-21 11:35 GMT+03:00 Dmitry Vyukov <dvyukov@xxxxxxxxxx>:
> >>
> >> ffffffff818884dd: 48 8b 03 mov (%rbx),%rax
> >>
> >> So whatever load "&wb->bdi->wb" produces is a NULL deref. (is it wb
> >> that is NULL?)
> >
> > Yes it's NULL wb, because there is only one load:
> > mov (%rbx),%rax => rax = wb->bdi
> > add $0x50,%rax => rax = &bdi->wb
>
>
> I bet that wb becomes NULL on the second iteration of the loop. The
> loop loops in case of a race with another thread, so it would also
> explain why it is difficult to reproduce.
>
> Tejun, does it make any sense to you?
Yeah, that makes sense. I think the culprit is 43d1c0eb7e11 ("block:
detach bdev inode from its wb in __blkdev_put()") which allows inode
to wb association to be broken while other operations including
writeback are in progress. I thought it should be okay as the inode
must be clean at that point but that obviously doesn't mean that there
can be no writeback operations in flight.
I hope we could eventually move away from the current model where we
try to swap out an underlying data structure while upper layers may
still be referring to it in the future but for now we can make sure
the writeback operation is finished before detaching wb.
Dmitry, I understand that the bug is difficult to reproduce but can
you please give the following patch a try?
Thanks.
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 20a2c02..209ea33 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1530,12 +1530,7 @@ static void __blkdev_put(struct block_device *bdev, fmode_t mode, int for_part)
kill_bdev(bdev);
bdev_write_inode(bdev);
- /*
- * Detaching bdev inode from its wb in __destroy_inode()
- * is too late: the queue which embeds its bdi (along with
- * root wb) can be gone as soon as we put_disk() below.
- */
- inode_detach_wb(bdev->bd_inode);
+ inode_detach_blkdev_wb(bdev);
}
if (bdev->bd_contains == bdev) {
if (disk->fops->release)
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index d0b5ca5..ec1f530 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -230,6 +230,25 @@ static inline void inode_detach_wb(struct inode *inode)
}
/**
+ * inode_detach_blkdev_wb - disassociate a bd_inode from its wb
+ * @bdev: block_device of interest
+ *
+ * @bdev is being put for the last time. Detaching bdev inode in
+ * __destroy_inode() is too late: the queue which embeds its bdi (along
+ * with root wb) can be gone as soon as the containing disk is put.
+ *
+ * This function dissociates @bdev->bd_inode from its wb. The inode must
+ * be clean and no further operations should be started on it.
+ */
+static inline void inode_detach_blkdev_wb(struct block_device *bdev)
+{
+ if (bdev->bd_inode->i_wb) {
+ flush_delayed_work(&bdev->bd_inode->i_wb->dwork);
+ inode_detach_wb(bdev->bd_inode);
+ }
+}
+
+/**
* wbc_attach_fdatawrite_inode - associate wbc and inode for fdatawrite
* @wbc: writeback_control of interest
* @inode: target inode
@@ -277,6 +296,10 @@ static inline void inode_detach_wb(struct inode *inode)
{
}
+static inline void inode_detach_blkdev_wb(struct block_device *bdev)
+{
+}
+
static inline void wbc_attach_and_unlock_inode(struct writeback_control *wbc,
struct inode *inode)
__releases(&inode->i_lock)