Re: [PATCH] block: Limit number of items taken from the I/O scheduler in one go

From: Bart Van Assche
Date: Mon Feb 03 2020 - 22:47:12 EST


On 2020-02-03 12:59, Salman Qazi wrote:
> We observed that it is possible for a flush to bypass the I/O
> scheduler and get added to hctx->dispatch in blk_mq_sched_bypass_insert.
> This can happen while a kworker is running blk_mq_do_dispatch_sched call
> in blk_mq_sched_dispatch_requests.
>
> However, the blk_mq_do_dispatch_sched call doesn't end in bounded time.
> As a result, the flush can sit there indefinitely, as the I/O scheduler
> feeds an arbitrary number of requests to the hardware.
>
> The solution is to periodically poll hctx->dispatch in
> blk_mq_do_dispatch_sched, to put a bound on the latency of the commands
> sitting there.

(added Christoph, Ming and Hannes to the Cc-list)

Thank you for having posted a patch; that really helps.

I see that my name occurs first in the "To:" list. Since Jens is the
block layer maintainer I think Jens should have been mentioned first.

In version v4.20 of the Linux kernel I found the following in the legacy
block layer code:
* From blk_insert_flush():
list_add_tail(&rq->queuelist, &q->queue_head);
* From elv_next_request():
list_for_each_entry(rq, &q->queue_head, queuelist)

I think this means that the legacy block layer sent flush requests to
the scheduler instead of directly to the block driver. How about
modifying the blk-mq code such that it mimics that approach? I'm asking
this because this patch, although the code looks clean, doesn't seem the
best solution to me.

> + if (count > 0 && count % q->max_sched_batch == 0 &&
> + !list_empty_careful(&hctx->dispatch))
> + break;

A modulo operation in the hot path? Please don't do that.

> +static ssize_t queue_max_sched_batch_store(struct request_queue *q,
> + const char *page,
> + size_t count)
> +{
> + int err, val;
> +
> + if (!q->mq_ops)
> + return -EINVAL;
> +
> + err = kstrtoint(page, 10, &val);
> + if (err < 0)
> + return err;
> +
> + if (val <= 0)
> + return -EINVAL;
> +
> + q->max_sched_batch = val;
> +
> + return count;
> +}

Has it been considered to use kstrtouint() instead of checking whether
the value returned by kstrtoint() is positive?

> + int max_sched_batch;

unsigned int?

Thanks,

Bart.