Re: [PATCH RFC v4 13/14] dm: wait for IO completion before removing dm device

From: Yu Kuai
Date: Tue Jan 30 2024 - 20:35:51 EST


Hi,

在 2024/01/30 21:05, Yu Kuai 写道:
Hi,

在 2024/01/30 19:46, Mikulas Patocka 写道:


On Tue, 30 Jan 2024, Yu Kuai wrote:

From: Yu Kuai <yukuai3@xxxxxxxxxx>

__dm_destroy() guarantee that device openers is zero, and then
only call 'presuspend' and 'postsuspend' for the target. For
request-based dm, 'md->holders' will be grabbed for each rq and
__dm_destroy() will wait for 'md->holders' to be zero. However, for
bio-based device, __dm_destroy() doesn't wait for all bios to be done.

Fix this problem by calling dm_wait_for_completion() to wail for all
inflight IO to be done, like what dm_suspend() does.

If the number of openers is zero, it is guaranteed that there are no bios
in flight. Therefore, we don't have to wait for them.

If there are bios in flight, it is a bug in the code that issues the bios.
You can put WARN_ON(dm_in_flight_bios(md)) there.

I add this patch because while testing, there is a problem that is
hard to reporduce, as I mentioned in the other thread. I'll add BUG_ON()
and try if I can still reporduce this problem without triggering it.

Thanks,
Kuai

[12504.959682] BUG bio-296 (Not tainted): Object already free
[12504.960239] -----------------------------------------------------------------------------
[12504.960239]
[12504.961209] Allocated in mempool_alloc+0xe8/0x270 age=30 cpu=1 pid=203288
[12504.961905]  kmem_cache_alloc+0x36a/0x3b0
[12504.962324]  mempool_alloc+0xe8/0x270
[12504.962712]  bio_alloc_bioset+0x3b5/0x920
[12504.963129]  bio_alloc_clone+0x3e/0x160
[12504.963533]  alloc_io+0x3d/0x1f0
[12504.963876]  dm_submit_bio+0x12f/0xa30
[12504.964267]  __submit_bio+0x9c/0xe0
[12504.964639]  submit_bio_noacct_nocheck+0x25a/0x570
[12504.965136]  submit_bio_wait+0xc2/0x160
[12504.965535]  blkdev_issue_zeroout+0x19b/0x2e0
[12504.965991]  ext4_init_inode_table+0x246/0x560
[12504.966462]  ext4_lazyinit_thread+0x750/0xbe0
[12504.966922]  kthread+0x1b4/0x1f0

After adding the BUG_ON(), I can still reporducing this BUG, this really
looks like a BUG, and I don't think this is related to dm-raid. Perhaps
you guys can take a look?

Thanks,
Kuai


Mikulas

Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx>
---
  drivers/md/dm.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 8dcabf84d866..2c0eae67d0f1 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -58,6 +58,7 @@ static DEFINE_IDR(_minor_idr);
  static DEFINE_SPINLOCK(_minor_lock);
  static void do_deferred_remove(struct work_struct *w);
+static int dm_wait_for_completion(struct mapped_device *md, unsigned int task_state);
  static DECLARE_WORK(deferred_remove_work, do_deferred_remove);
@@ -2495,6 +2496,8 @@ static void __dm_destroy(struct mapped_device *md, bool wait)
      if (!dm_suspended_md(md)) {
          dm_table_presuspend_targets(map);
          set_bit(DMF_SUSPENDED, &md->flags);
+        if (wait)
+            dm_wait_for_completion(md, TASK_UNINTERRUPTIBLE);
          set_bit(DMF_POST_SUSPENDING, &md->flags);
          dm_table_postsuspend_targets(map);
      }
--
2.39.2


.


.