Re: [PATCH v7 1/4] mmc: Add MMC host software queue support

From: (Exiting) Baolin Wang
Date: Mon Jan 27 2020 - 05:23:50 EST


Hi Ulf,

(Sorry for the late reply due to my holidays).

On Tue, 21 Jan 2020 at 19:52, Ulf Hansson <ulf.hansson@xxxxxxxxxx> wrote:
>
> On Mon, 18 Nov 2019 at 11:43, Baolin Wang <baolin.wang7@xxxxxxxxx> wrote:
> >
> > From: Baolin Wang <baolin.wang@xxxxxxxxxx>
> >
> > Now the MMC read/write stack will always wait for previous request is
> > completed by mmc_blk_rw_wait(), before sending a new request to hardware,
> > or queue a work to complete request, that will bring context switching
> > overhead, especially for high I/O per second rates, to affect the IO
> > performance.
> >
> > Thus this patch introduces MMC software queue interface based on the
> > hardware command queue engine's interfaces, which is similar with the
> > hardware command queue engine's idea, that can remove the context
> > switching. Moreover we set the default queue depth as 32 for software
> > queue, which allows more requests to be prepared, merged and inserted
> > into IO scheduler to improve performance, but we only allow 2 requests
> > in flight, that is enough to let the irq handler always trigger the
> > next request without a context switch, as well as avoiding a long latency.
> >
> > From the fio testing data in cover letter, we can see the software
> > queue can improve some performance with 4K block size, increasing
> > about 16% for random read, increasing about 90% for random write,
> > though no obvious improvement for sequential read and write.
> >
> > Moreover we can expand the software queue interface to support MMC
> > packed request or packed command in future.
> >
> > Signed-off-by: Baolin Wang <baolin.wang@xxxxxxxxxx>
> > Signed-off-by: Baolin Wang <baolin.wang7@xxxxxxxxx>
> > ---
> > drivers/mmc/core/block.c | 61 ++++++++
> > drivers/mmc/core/mmc.c | 13 +-
> > drivers/mmc/core/queue.c | 33 ++++-
> > drivers/mmc/host/Kconfig | 7 +
> > drivers/mmc/host/Makefile | 1 +
> > drivers/mmc/host/mmc_hsq.c | 344 ++++++++++++++++++++++++++++++++++++++++++++
> > drivers/mmc/host/mmc_hsq.h | 30 ++++
> > include/linux/mmc/host.h | 3 +
> > 8 files changed, 482 insertions(+), 10 deletions(-)
> > create mode 100644 drivers/mmc/host/mmc_hsq.c
> > create mode 100644 drivers/mmc/host/mmc_hsq.h
> >
> > diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
> > index 2c71a43..870462c 100644
> > --- a/drivers/mmc/core/block.c
> > +++ b/drivers/mmc/core/block.c
> > @@ -168,6 +168,11 @@ struct mmc_rpmb_data {
> >
> > static inline int mmc_blk_part_switch(struct mmc_card *card,
> > unsigned int part_type);
> > +static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq,
> > + struct mmc_card *card,
> > + int disable_multi,
> > + struct mmc_queue *mq);
> > +static void mmc_blk_swq_req_done(struct mmc_request *mrq);
>
> We have debated whether swq ("software queue") is a good name - and
> just to confirm, I also don't have a great better suggestion.
>
> However, I do like the name of new host interface though, host
> software queue, "hsq". That makes sense to me.
>
> One option, to possibly make the core code more aligned to the hsq
> interface, could be to stick with the "hsq" acronym for the core layer
> as well.

Sure.

>
> In other words, the above function (and its new friends introduced in
> the series) could be renamed to mmc_blk_hsq_req_done(). What do you
> think about that?

Yes, totally agree with you. Will change in the next version.

> >
> > static struct mmc_blk_data *mmc_blk_get(struct gendisk *disk)
> > {
> > @@ -1569,9 +1574,30 @@ static int mmc_blk_cqe_issue_flush(struct mmc_queue *mq, struct request *req)
> > return mmc_blk_cqe_start_req(mq->card->host, mrq);
> > }
> >
> > +static int mmc_blk_swq_issue_rw_rq(struct mmc_queue *mq, struct request *req)
> > +{
> > + struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
> > + struct mmc_host *host = mq->card->host;
> > + int err;
> > +
> > + mmc_blk_rw_rq_prep(mqrq, mq->card, 0, mq);
> > + mqrq->brq.mrq.done = mmc_blk_swq_req_done;
> > + mmc_pre_req(host, &mqrq->brq.mrq);
> > +
> > + err = mmc_cqe_start_req(host, &mqrq->brq.mrq);
> > + if (err)
> > + mmc_post_req(host, &mqrq->brq.mrq, err);
> > +
> > + return err;
> > +}
> > +
> > static int mmc_blk_cqe_issue_rw_rq(struct mmc_queue *mq, struct request *req)
> > {
> > struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
> > + struct mmc_host *host = mq->card->host;
> > +
> > + if (host->swq_enabled)
>
> If we switch to use "hsq", this would then be "hsq_enabled".

Sure.

>
> > + return mmc_blk_swq_issue_rw_rq(mq, req);
> >
> > mmc_blk_data_prep(mq, mqrq, 0, NULL, NULL);
> >
> > @@ -1957,6 +1983,41 @@ static void mmc_blk_urgent_bkops(struct mmc_queue *mq,
> > mmc_run_bkops(mq->card);
> > }
> >
> > +static void mmc_blk_swq_req_done(struct mmc_request *mrq)
> > +{
> > + struct mmc_queue_req *mqrq =
> > + container_of(mrq, struct mmc_queue_req, brq.mrq);
> > + struct request *req = mmc_queue_req_to_req(mqrq);
> > + struct request_queue *q = req->q;
> > + struct mmc_queue *mq = q->queuedata;
> > + struct mmc_host *host = mq->card->host;
> > + unsigned long flags;
> > +
> > + if (mmc_blk_rq_error(&mqrq->brq) ||
> > + mmc_blk_urgent_bkops_needed(mq, mqrq)) {
> > + spin_lock_irqsave(&mq->lock, flags);
> > + mq->recovery_needed = true;
> > + mq->recovery_req = req;
> > + spin_unlock_irqrestore(&mq->lock, flags);
> > +
> > + host->cqe_ops->cqe_recovery_start(host);
> > +
> > + schedule_work(&mq->recovery_work);
> > + return;
> > + }
> > +
> > + mmc_blk_rw_reset_success(mq, req);
> > +
> > + /*
> > + * Block layer timeouts race with completions which means the normal
> > + * completion path cannot be used during recovery.
> > + */
> > + if (mq->in_recovery)
> > + mmc_blk_cqe_complete_rq(mq, req);
> > + else
> > + blk_mq_complete_request(req);
> > +}
> > +
> > void mmc_blk_mq_complete(struct request *req)
> > {
> > struct mmc_queue *mq = req->q->queuedata;
> > diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
> > index c880489..8eac1a2 100644
> > --- a/drivers/mmc/core/mmc.c
> > +++ b/drivers/mmc/core/mmc.c
> > @@ -1852,15 +1852,22 @@ static int mmc_init_card(struct mmc_host *host, u32 ocr,
> > */
> > card->reenable_cmdq = card->ext_csd.cmdq_en;
> >
> > - if (card->ext_csd.cmdq_en && !host->cqe_enabled) {
> > + if (host->cqe_ops && !host->cqe_enabled) {
>
> The doesn't looks entirely correct to me, as it means enabling the CQE
> hardware for hosts with MMC_CAP2_CQE set, but no matter of whether the
> eMMC card really supports CMDQ (or if we failed to enabled CMDQ for
> the card). More comments below.
>
> > err = host->cqe_ops->cqe_enable(host, card);
> > if (err) {
> > pr_err("%s: Failed to enable CQE, error %d\n",
> > mmc_hostname(host), err);
> > } else {
> > host->cqe_enabled = true;
> > - pr_info("%s: Command Queue Engine enabled\n",
> > - mmc_hostname(host));
> > +
> > + if (card->ext_csd.cmdq_en) {
> > + pr_info("%s: Command Queue Engine enabled\n",
> > + mmc_hostname(host));
> > + } else {
> > + host->swq_enabled = true;
> > + pr_info("%s: Software Queue enabled\n",
> > + mmc_hostname(host));
>
> A few questions around the above code.
>
> 1.
> Let's assume the host supports MMC_CAP2_CQE, but the eMMC card doesn't
> support CMDQ.
>
> In this case, we still want to allow the host to use the software
> variant (the hsq) of the interface. In principle that is what the code
> above already tries to implement, but then you also need to update the
> support in drivers/mmc/host/cqhci.[ch] to support that dynamically.
> For example, the ->cqe_enable() callback should check
> "card->ext_csd.cmdq_en" and adjust its behavior accordingly, depending
> if the flag has been set or not.

Right. I will add "card->ext_csd.cmdq_en" checking in
drivers/mmc/host/cqhci.c to fix this issue.


> 2.
> I also notice that you are enabling the use of the hsq path, solely
> for eMMC cards. I am guessing hsq is beneficial to use for SD cards as
> well, don't you think?

Right.

>
> Of course, I am fine by enabling that in a step-by-step approach, so
> no need to add that as a part of $subject patch. Although, at least
> make it a part of the series.

Yes, that's what I thought. The SD card related patch was already in
my local tree, and I will post it if this patch set was accepted.

>
> > + }
> > }
> > }
> >
> > diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
> > index 9edc086..d9086c1 100644
> > --- a/drivers/mmc/core/queue.c
> > +++ b/drivers/mmc/core/queue.c
> > @@ -62,7 +62,7 @@ enum mmc_issue_type mmc_issue_type(struct mmc_queue *mq, struct request *req)
> > {
> > struct mmc_host *host = mq->card->host;
> >
> > - if (mq->use_cqe)
> > + if (mq->use_cqe && !host->swq_enabled)
> > return mmc_cqe_issue_type(host, req);
> >
> > if (req_op(req) == REQ_OP_READ || req_op(req) == REQ_OP_WRITE)
> > @@ -124,12 +124,14 @@ static enum blk_eh_timer_return mmc_mq_timed_out(struct request *req,
> > {
> > struct request_queue *q = req->q;
> > struct mmc_queue *mq = q->queuedata;
> > + struct mmc_card *card = mq->card;
> > + struct mmc_host *host = card->host;
> > unsigned long flags;
> > int ret;
> >
> > spin_lock_irqsave(&mq->lock, flags);
> >
> > - if (mq->recovery_needed || !mq->use_cqe)
> > + if (mq->recovery_needed || !mq->use_cqe || host->swq_enabled)
> > ret = BLK_EH_RESET_TIMER;
> > else
> > ret = mmc_cqe_timed_out(req);
> > @@ -144,12 +146,13 @@ static void mmc_mq_recovery_handler(struct work_struct *work)
> > struct mmc_queue *mq = container_of(work, struct mmc_queue,
> > recovery_work);
> > struct request_queue *q = mq->queue;
> > + struct mmc_host *host = mq->card->host;
> >
> > mmc_get_card(mq->card, &mq->ctx);
> >
> > mq->in_recovery = true;
> >
> > - if (mq->use_cqe)
> > + if (mq->use_cqe && !host->swq_enabled)
> > mmc_blk_cqe_recovery(mq);
> > else
> > mmc_blk_mq_recovery(mq);
> > @@ -160,6 +163,9 @@ static void mmc_mq_recovery_handler(struct work_struct *work)
> > mq->recovery_needed = false;
> > spin_unlock_irq(&mq->lock);
> >
> > + if (host->swq_enabled)
> > + host->cqe_ops->cqe_recovery_finish(host);
> > +
> > mmc_put_card(mq->card, &mq->ctx);
> >
> > blk_mq_run_hw_queues(q, true);
> > @@ -279,6 +285,14 @@ static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
> > }
> > break;
> > case MMC_ISSUE_ASYNC:
> > + /*
> > + * For MMC host software queue, we only allow 2 requests in
> > + * flight to avoid a long latency.
> > + */
> > + if (host->swq_enabled && mq->in_flight[issue_type] > 2) {
> > + spin_unlock_irq(&mq->lock);
> > + return BLK_STS_RESOURCE;
> > + }
> > break;
> > default:
> > /*
> > @@ -430,11 +444,16 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card)
> > * The queue depth for CQE must match the hardware because the request
> > * tag is used to index the hardware queue.
> > */
> > - if (mq->use_cqe)
> > - mq->tag_set.queue_depth =
> > - min_t(int, card->ext_csd.cmdq_depth, host->cqe_qdepth);
> > - else
> > + if (mq->use_cqe) {
> > + if (host->swq_enabled)
> > + mq->tag_set.queue_depth = host->cqe_qdepth;
>
> I don't think we need to treat the hsq as special case in regards to
> the .queue_depth.
>
> It should be fine to use the default MMC_QUEUE_DEPTH (64), don't you think?

Yes, I agree.

> > + else
> > + mq->tag_set.queue_depth =
> > + min_t(int, card->ext_csd.cmdq_depth, host->cqe_qdepth);
> > + } else {
> > mq->tag_set.queue_depth = MMC_QUEUE_DEPTH;
> > + }
> > +
> > mq->tag_set.numa_node = NUMA_NO_NODE;
> > mq->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_BLOCKING;
> > mq->tag_set.nr_hw_queues = 1;
> > diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig
> > index 49ea02c..efa4019 100644
> > --- a/drivers/mmc/host/Kconfig
> > +++ b/drivers/mmc/host/Kconfig
> > @@ -936,6 +936,13 @@ config MMC_CQHCI
> >
> > If unsure, say N.
> >
> > +config MMC_HSQ
> > + tristate "MMC Host Software Queue support"
> > + help
> > + This selects the Software Queue support.
> > +
> > + If unsure, say N.
> > +
> > config MMC_TOSHIBA_PCI
> > tristate "Toshiba Type A SD/MMC Card Interface Driver"
> > depends on PCI
> > diff --git a/drivers/mmc/host/Makefile b/drivers/mmc/host/Makefile
> > index 11c4598..c14b439 100644
> > --- a/drivers/mmc/host/Makefile
> > +++ b/drivers/mmc/host/Makefile
> > @@ -98,6 +98,7 @@ obj-$(CONFIG_MMC_SDHCI_BRCMSTB) += sdhci-brcmstb.o
> > obj-$(CONFIG_MMC_SDHCI_OMAP) += sdhci-omap.o
> > obj-$(CONFIG_MMC_SDHCI_SPRD) += sdhci-sprd.o
> > obj-$(CONFIG_MMC_CQHCI) += cqhci.o
> > +obj-$(CONFIG_MMC_HSQ) += mmc_hsq.o
> >
> > ifeq ($(CONFIG_CB710_DEBUG),y)
> > CFLAGS-cb710-mmc += -DDEBUG
> > diff --git a/drivers/mmc/host/mmc_hsq.c b/drivers/mmc/host/mmc_hsq.c
> > new file mode 100644
> > index 0000000..f5a4f93
> > --- /dev/null
> > +++ b/drivers/mmc/host/mmc_hsq.c
> > @@ -0,0 +1,344 @@
>
> Other than the above, this looks indeed very promising! I have no
> further comment for the rest of the patches in the series.

Very appreciated for your good suggestion. I will send out the next
version after my holidays with addressing your comments. Thanks.

--
Baolin Wang
Best Regards