Re: [PATCH RFC 3/5] blk-mq: Facilitate a shared tags per tagset

From: John Garry
Date: Mon Nov 18 2019 - 05:31:45 EST


On 15/11/2019 17:57, Bart Van Assche wrote:
On 11/15/19 2:24 AM, John Garry wrote:
Bart Van Assche wrote:
> How about sharing tag sets across hardware
> queues, e.g. like in the (totally untested) patch below?

So this is similar in principle what Ming Lei came up with here:
https://lore.kernel.org/linux-block/20190531022801.10003-1-ming.lei@xxxxxxxxxx/
However your implementation looks neater, which is good.

My concern with this approach is that we can't differentiate which tags are allocated for which hctx, and sometimes we need to know that.


Hi Bart,

An example here was blk_mq_queue_tag_busy_iter(), which iterates the bits for each hctx. This would just be broken by that change, unless we record which bits are associated with each hctx.

I disagree. In bt_iter() I added " && rq->mq_hctx == hctx" such that blk_mq_queue_tag_busy_iter() only calls the callback function for matching (hctx, rq) pairs.

OK, I see. I assumed that rq->mq_hctx was statically set when we initially allocate the static requests per hctx; but that doesnât appear so - it's set in blk_mq_get_request()->blk_mq_rq_ctx_init().


Another example was __blk_mq_tag_idle(), which looks problematic.

Please elaborate.

Again, this was for the same reason being that I thought we could not differentiate which rqs were associated with which hctx.


For debugfs, when we examine /sys/kernel/debug/block/.../hctxX/tags_bitmap, wouldn't that be the tags for all hctx (hctx0)?

For debugging reasons, I would say we want to know which tags are allocated for a specific hctx, as this is tightly related to the requests for that hctx.

That is an open issue in the patch I posted and something that needs to be addressed. One way to address this is to change the sbitmap_bitmap_show() calls into calls to a function that only shows those bits for which rq->mq_hctx == hctx.

Yeah, understood.


@@ -341,8 +341,11 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
ÂÂÂÂÂ int i;

ÂÂÂÂÂ for (i = 0; i < tagset->nr_hw_queues; i++) {
-ÂÂÂÂÂÂÂ if (tagset->tags && tagset->tags[i])
+ÂÂÂÂÂÂÂ if (tagset->tags && tagset->tags[i]) {
ÂÂÂÂÂÂÂÂÂÂÂÂÂ blk_mq_all_tag_busy_iter(tagset->tags[i], fn, priv);

As I mentioned earlier, wouldn't this iterate over all tags for all hctx's, when we just want the tags for hctx[i]?

Thanks,
John

[Not trimming reply for future reference]

+ÂÂÂÂÂÂÂÂÂÂÂ if (tagset->share_tags)
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ break;
+ÂÂÂÂÂÂÂ }
ÂÂÂÂÂ }
 }
 EXPORT_SYMBOL(blk_mq_tagset_busy_iter);

Since blk_mq_tagset_busy_iter() loops over all hardware queues all what is changed is the order in which requests are examined. I am not aware of any block driver that calls blk_mq_tagset_busy_iter() and that depends on the order of the requests passed to the callback function.


OK, fine.

So, to me, this approach also seems viable then.

I am however not so happy with how we use blk_mq_tag_set.tags[0] for the shared tags; I would like to use blk_mq_tag_set.shared_tags and make blk_mq_tag_set.tags[] point at blk_mq_tag_set.shared_tags or maybe not blk_mq_tag_set.tags[] at all. However maybe that change may be more intrusive.

And another more real concern is that we miss a check somewhere for rq->mq_hctx == hctx when examining the bits on the shared tags.

Thanks,
John