Re: [PATCH] block: flush writeback dwork before detaching a bdev inode from it
From: Dmitry Vyukov
Date: Mon Jun 20 2016 - 09:39:33 EST
On Mon, Jun 20, 2016 at 3:31 PM, Jan Kara <jack@xxxxxxx> wrote:
> Hi,
>
> On Fri 17-06-16 12:04:05, Tejun Heo wrote:
>> 43d1c0eb7e11 ("block: detach bdev inode from its wb in
>> __blkdev_put()") detached bdev inode from its wb as the bdev inode may
>> outlive the underlying bdi and thus the wb. This is accomplished by
>> invoking inode_detach_wb() from __blkdev_put(); however, while the
>> inode can't be dirtied by the time control reaches there, that doesn't
>> guarantee that writeback isn't already in progress on the inode. This
>> can lead to the inode being disassociated from its wb while writeback
>> operation is in flight causing oopses like the following.
>
> <snip>
>
>> Fix it by flushing the wb dwork before detaching the inode. Combined
>> with the fact that the inode can no longer be dirtied, this guarantees
>> that no writeback operation can be in flight or initiated.
>
> Sorry for the late reply but now when thinking about the patch I don't
> think it is quite right. Writeback can happen from other contexts than just
> the worker one (e.g. kswapd can do writeback, or in some out-of-memory
> situations we may punt to doing writeback directly instead of calling the
> worker, or sync(2) calls fdatawrite() for block device inode directly when
> iterating through blockdev superblock). So flushing the workqueue IMHO is
> not covering 100% of cases.
>
> I wanted to suggest to use inode_wait_for_writeback() which will make sure
> writeback is done with the inode. However we effectively already do this
> by calling bdev_write_inode() which calls writeback_single_inode() in
> WB_SYNC_ALL mode. So by the time that call completes we are sure writeback
> code is not looking at the inode. *However* what I think is happening is
> that sync_blockdev() writes all the dirty pages, bdev_write_inode() writes
> the inode and clears all dirty bits, however the inode still stays in the
> b_dirty / b_io list of the wb because it has I_DIRTY_TIME set. Subsequently
> once flusher work runs, it finds the inode, looks at it and boom. And the
> problem seems to be that write_inode_now(inode, true) does not result in
> I_DIRTY_TIME being cleared.
>
> Attached patch should fix this issue - it is compile-tested only. Dmitry,
> can you check whether this patch fixes the issue for you as well?
I can't directly test it because crash happened very infrequently.
If Tejun/Ted agree that it is the right way to fix it, then I can
patch it, restart the fuzzer and leave it running for a while.