Hi,
I've looked into blk-mq and possible support for I/O scheduling.
The reason for this is to minimize performance degradation with
rotational devices when scsi_mod.use_blk_mq=1 is switched on.
I think that the degradation is well reflected with fio measurements.
With an increasing number of jobs you'll encounter a significant
performance drop for sequential reads and writes with blk-mq in
contrast to CFQ. blk-mq ensures that requests from different processes
(CPUs) are "perfectly shuffled" in a hardware queue. This is no
problem for non-rotational devices for which blk-mq is aimed for but
not so nice for rotational disks.
(i) I've done some tests with patch c2ed2f2dcf92 (blk-mq: first cut
deadline scheduling) from branch mq-deadline of linux-block
repository. I've not seen a significant performance impact when
enabling it (neither for non-rotational nor for rotational
disks).
(ii) I've played with code to enable sorting/merging of requests. I
did this in flush_busy_ctxs. This didn't have a performance
impact either. On a closer look this was due to high frequency
of calls to __blk_mq_run_hw_queue. There was almost nothing to
sort (too few requests). I guess that's also the reason why (i)
had not much impact.
(iii) With CFQ I've observed similar performance patterns to blk-mq if
slice_idle was set to 0.
(iv) I thought about introducing a per software queue time slice
during which blk-mq will service only one software queue (one
CPU) and not flush all software queues. This could help to
enqueue multiple requests belonging to the same process (as long
as it runs on same CPU) into a hardware queue. A minimal patch
to implement this is attached below.
The latter helped to improve performance for sequential reads and
writes. But it's not on a par with CFQ. Increasing the time slice is
suboptimal (as shown with the 2ms results, see below). It might be
possible to get better performance when further reducing the initial
time slice and adapting it up to a maximum value if there are
repeatedly pending requests for a CPU.
After these observations and assuming that non-rotational devices are
most likely fine using blk-mq without I/O scheduling support I wonder
whether
- it's really a good idea to re-implement scheduling support for
blk-mq that eventually behaves like CFQ for rotational devices.
- it's technical possible to support both blk-mq and CFQ for different
devices on the same host adapter. This would allow to use "good old"
code for "good old" rotational devices. (But this might not be a
choice if in the long run a goal is to get rid of non-blk-mq code --
not sure what the plans are.)
What do you think about this?