Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler
From: Arnd Bergmann
Date: Fri Oct 28 2016 - 12:07:17 EST
On Friday, October 28, 2016 9:30:07 AM CEST Jens Axboe wrote:
> On 10/28/2016 03:32 AM, Linus Walleij wrote:
> > The patch to enable MQ looks like this:
> > https://git.kernel.org/cgit/linux/kernel/git/linusw/linux-stericsson.git/commit/?h=mmc-mq&id=8f79b527e2e854071d8da019451da68d4753f71d
>
> BTW, another viable "hack" for the depth issue would be to expose more
> than one hardware queue. It's meant to map to a distinct submission
> region in the hardware, but there's nothing stopping the driver from
> using it differently. Might not be cleaner than just increasing the
> queue depth on a single queue, though.
>
> That still won't solve the issue of lying about it and causing IO
> scheduler confusion, of course.
>
> Also, 4.8 and newer have support for BLK_MQ_F_BLOCKING, if you need to
> block in ->queue_rq(). That could eliminate the need to offload to a
> kthread manually.
I think the main reason for the kthread is that on ARM and other
architectures, the dma mapping operations are fairly slow (for
cache flushes or bounce buffering) and we want to minimize the
time between subsequent requests being handled by the hardware.
This is not unique to MMC in any way, MMC just happens to be
common on ARM and it is limited by its lack of hardware
command queuing.
It would be nice to do a similar trick for SCSI disks,
especially USB mass storage, maybe also SATA, which are the
next most common storage devices on non-coherent ARM systems
(SATA nowadays often comes with NCQ, so it's less of an
issue)
It may be reasonable to tie this in with the I/O scheduler:
if you don't have a scheduler, the access to the device is
probably rather direct and you want to avoid any complexity
in the kernel, but if preparing a request is expensive
and the hardware has no queuing, you probably also want to
use a scheduler.
We should probably also try to understand how this could
work out with USB mass storage, if there is a solution at
all, and then do it for MMC in a way that would work on
both. I don't think the USB core can currently split the
dma_map_sg() operation from the USB command submission,
so this may require some deeper surgery there.
Arnd