Re: [PATCH RFC 2/7] blk-mq: delay tag fair sharing until fail to get driver tag

From: Yu Kuai
Date: Mon Jun 19 2023 - 02:07:20 EST


Hi,

在 2023/06/19 13:55, Hannes Reinecke 写道:
On 6/18/23 18:07, Yu Kuai wrote:
From: Yu Kuai <yukuai3@xxxxxxxxxx>

Start tag fair sharing when a device start to issue io will waste
resources, same number of tags will be assigned to each disk/hctx,
and such tags can't be used for other disk/hctx, which means a disk/hctx
can't use more than assinged tags even if there are still lots of tags
that is assinged to other disks are unused.

Add a new api blk_mq_driver_tag_busy(), it will be called when get
driver tag failed, and move tag sharing from blk_mq_tag_busy() to
blk_mq_driver_tag_busy().

This approch will work well if total tags are not exhausted, and follow
up patches will try to refactor how tag is shared to handle this case.

Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx>
---
  block/blk-mq-debugfs.c |  4 ++-
  block/blk-mq-tag.c     | 60 ++++++++++++++++++++++++++++++++++--------
  block/blk-mq.c         |  4 ++-
  block/blk-mq.h         | 13 ++++++---
  include/linux/blk-mq.h |  6 +++--
  include/linux/blkdev.h |  1 +
  6 files changed, 70 insertions(+), 18 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 431aaa3eb181..de5a911b07c2 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -400,8 +400,10 @@ static void blk_mq_debugfs_tags_show(struct seq_file *m,
  {
      seq_printf(m, "nr_tags=%u\n", tags->nr_tags);
      seq_printf(m, "nr_reserved_tags=%u\n", tags->nr_reserved_tags);
-    seq_printf(m, "active_queues=%d\n",
+    seq_printf(m, "active_queues=%u\n",
             READ_ONCE(tags->ctl.active_queues));
+    seq_printf(m, "share_queues=%u\n",
+           READ_ONCE(tags->ctl.share_queues));
      seq_puts(m, "\nbitmap_tags:\n");
      sbitmap_queue_show(&tags->bitmap_tags, m);
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index fe41a0d34fc0..1c2bde917195 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -29,6 +29,32 @@ static void blk_mq_update_wake_batch(struct blk_mq_tags *tags,
              users);
  }
+void __blk_mq_driver_tag_busy(struct blk_mq_hw_ctx *hctx)
+{
+    struct blk_mq_tags *tags = hctx->tags;
+
+    /*
+     * calling test_bit() prior to test_and_set_bit() is intentional,
+     * it avoids dirtying the cacheline if the queue is already active.
+     */
+    if (blk_mq_is_shared_tags(hctx->flags)) {
+        struct request_queue *q = hctx->queue;
+
+        if (test_bit(QUEUE_FLAG_HCTX_BUSY, &q->queue_flags) ||
+            test_and_set_bit(QUEUE_FLAG_HCTX_BUSY, &q->queue_flags))
+            return;
+    } else {
+        if (test_bit(BLK_MQ_S_DTAG_BUSY, &hctx->state) ||
+            test_and_set_bit(BLK_MQ_S_DTAG_BUSY, &hctx->state))
+            return;
+    }
+
+    spin_lock_irq(&tags->lock);
+    WRITE_ONCE(tags->ctl.share_queues, tags->ctl.active_queues);
+    blk_mq_update_wake_batch(tags, tags->ctl.share_queues);
+    spin_unlock_irq(&tags->lock);
+}
+
  /*
   * If a previously inactive queue goes active, bump the active user count.
   * We need to do this before try to allocate driver tag, then even if fail
@@ -37,7 +63,6 @@ static void blk_mq_update_wake_batch(struct blk_mq_tags *tags,
   */
  void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
  {
-    unsigned int users;
      struct blk_mq_tags *tags = hctx->tags;
      /*
@@ -57,9 +82,7 @@ void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
      }
      spin_lock_irq(&tags->lock);
-    users = tags->ctl.active_queues + 1;
-    WRITE_ONCE(tags->ctl.active_queues, users);
-    blk_mq_update_wake_batch(tags, users);
+    WRITE_ONCE(tags->ctl.active_queues, tags->ctl.active_queues + 1);

Why did you remove the call to blk_mq_update_wake_batch() here?

blk_mq_update_wake_batch() should be called when the available tags is
changed, however, active_queues is no longer used for hctx_may_queue()
to calculate available tags, share_queues is used instead and it's
updated in the new helper blk_mq_driver_tag_busy().

Thanks,
Kuai

Cheers,

Hannes