Re: [dm-devel] [PATCH] Fix over-zealous flush_disk when changing device size.

From: Jeff Moyer
Date: Thu Mar 17 2011 - 13:34:13 EST

NeilBrown <neilb@xxxxxxx> writes:

> On Wed, 16 Mar 2011 16:30:22 -0400 Jeff Moyer <jmoyer@xxxxxxxxxx> wrote:
>> NeilBrown <neilb@xxxxxxx> writes:
>> >> Synchronous notification of errors. If we don't try to write everything
>> >> back immediately after the size change, we don't see dirty pages in
>> >> zapped regions until the writeout/page cache management takes it into
>> >> its head to try to clean the pages.
>> >>
>> >
>> > So if you just want synchronous errors, I think you want:
>> > fsync_bdev()
>> >
>> > which calls sync_filesystem() if it can find a filesystem, else
>> > sync_blockdev(); (sync_filesystem itself calls sync_blockdev too).
>> ... which deadlocks md. ;-) writeback_inodes_sb_nr is waiting for the
>> flusher thread to write back the dirty data. The flusher thread is
>> stuck in md_write_start, here:
>> wait_event(mddev->sb_wait,
>> !test_bit(MD_CHANGE_PENDING, &mddev->flags));
>> This is after reverting your change, and replacing the flush_disk call
>> in check_disk_size_change with a call to fsync_bdev. I'm not familiar
>> enough with md to really suggest a way forward. Neil?
> That would be quite easy to avoid.
> Just call
> md_write_start()
> before revalidate_disk, and
> md_write_end()
> afterwards.

That does not avoid the problem (if I understood your suggestion). You
instead end up with the following:

INFO: task md127_raid5:2282 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
md127_raid5 D ffff88011c72d0a0 5688 2282 2 0x00000080
ffff880118997c20 0000000000000046 ffff880100000000 0000000000000246
0000000000014d00 ffff88011c72cb10 ffff88011c72d0a0 ffff880118997fd8
ffff88011c72d0a8 0000000000014d00 ffff880118996010 0000000000014d00
Call Trace:
[<ffffffff8138bbbd>] md_write_start+0xad/0x1d0
[<ffffffff810801d0>] ? autoremove_wake_function+0x0/0x40
[<ffffffffa0311558>] raid5_finish_reshape+0x98/0x1e0 [raid456]
[<ffffffff8138a933>] reap_sync_thread+0x63/0x130
[<ffffffff8138c8b6>] md_check_recovery+0x1f6/0x6f0
[<ffffffffa03150ab>] raid5d+0x3b/0x610 [raid456]
[<ffffffff810804c9>] ? prepare_to_wait+0x59/0x90
[<ffffffff81387ee9>] md_thread+0x119/0x150
[<ffffffff810801d0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff81387dd0>] ? md_thread+0x0/0x150
[<ffffffff8107fb56>] kthread+0x96/0xa0
[<ffffffff8100cc04>] kernel_thread_helper+0x4/0x10
[<ffffffff8107fac0>] ? kthread+0x0/0xa0
[<ffffffff8100cc00>] ? kernel_thread_helper+0x0/0x10

I'll leave this to you to work out when you have time.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at