Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

From: Jan Kara
Date: Fri Oct 28 2016 - 03:59:19 EST


On Thu 27-10-16 10:26:18, Jens Axboe wrote:
> On 10/27/2016 03:26 AM, Jan Kara wrote:
> >On Wed 26-10-16 10:12:38, Jens Axboe wrote:
> >>On 10/26/2016 10:04 AM, Paolo Valente wrote:
> >>>
> >>>>Il giorno 26 ott 2016, alle ore 17:32, Jens Axboe <axboe@xxxxxxxxx> ha scritto:
> >>>>
> >>>>On 10/26/2016 09:29 AM, Christoph Hellwig wrote:
> >>>>>On Wed, Oct 26, 2016 at 05:13:07PM +0200, Arnd Bergmann wrote:
> >>>>>>The question to ask first is whether to actually have pluggable
> >>>>>>schedulers on blk-mq at all, or just have one that is meant to
> >>>>>>do the right thing in every case (and possibly can be bypassed
> >>>>>>completely).
> >>>>>
> >>>>>That would be my preference. Have a BFQ-variant for blk-mq as an
> >>>>>option (default to off unless opted in by the driver or user), and
> >>>>>not other scheduler for blk-mq. Don't bother with bfq for non
> >>>>>blk-mq. It's not like there is any advantage in the legacy-request
> >>>>>device even for slow devices, except for the option of having I/O
> >>>>>scheduling.
> >>>>
> >>>>It's the only right way forward. blk-mq might not offer any substantial
> >>>>advantages to rotating storage, but with scheduling, it won't offer a
> >>>>downside either. And it'll take us towards the real goal, which is to
> >>>>have just one IO path.
> >>>
> >>>ok
> >>>
> >>>>Adding a new scheduler for the legacy IO path
> >>>>makes no sense.
> >>>
> >>>I would fully agree if effective and stable I/O scheduling would be
> >>>available in blk-mq in one or two months. But I guess that it will
> >>>take at least one year optimistically, given the current status of the
> >>>needed infrastructure, and given the great difficulties of doing
> >>>effective scheduling at the high parallelism and extreme target speeds
> >>>of blk-mq. Of course, this holds true unless little clever scheduling
> >>>is performed.
> >>>
> >>>So, what's the point in forcing a lot of users wait another year or
> >>>more, for a solution that has yet to be even defined, while they could
> >>>enjoy a much better system, and then switch an even better system when
> >>>scheduling is ready in blk-mq too?
> >>
> >>That same argument could have been made 2 years ago. Saying no to a new
> >>scheduler for the legacy framework goes back roughly that long. We could
> >>have had BFQ for mq NOW, if we didn't keep coming back to this very
> >>point.
> >>
> >>I'm hesistant to add a new scheduler because it's very easy to add, very
> >>difficult to get rid of. If we do add BFQ as a legacy scheduler now,
> >>it'll take us years and years to get rid of it again. We should be
> >>moving towards LESS moving parts in the legacy path, not more.
> >>
> >>We can keep having this discussion every few years, but I think we'd
> >>both prefer to make some actual progress here. It's perfectly fine to
> >>add an interface for a single queue interface for an IO scheduler for
> >>blk-mq, since we don't care too much about scalability there. And that
> >>won't take years, that should be a few weeks. Retrofitting BFQ on top of
> >>that should not be hard either. That can co-exist with a real multiqueue
> >>scheduler as well, something that's geared towards some fairness for
> >>faster devices.
> >
> >OK, so some solution like having a variant of blk_sq_make_request() that
> >will consume requests, do IO scheduling decisions on them, and feed them
> >into the HW queue is it sees fit would be acceptable? That will provide the
> >IO scheduler a global view that it needs for complex scheduling decisions
> >so it should indeed be relatively easy to port BFQ to work like that.
>
> I'd probably start off Omar's base [1] that switches the software queues
> to store bios instead of requests, since that lifts the of the 1:1
> mapping between what we can queue up and what we can dispatch. Without
> that, the IO scheduler won't have too much to work with. And with that
> in place, it'll be a "bio in, request out" type of setup, which is
> similar to what we have in the legacy path.
>
> I'd keep the software queues, but as a starting point, mandate 1
> hardware queue to keep that as the per-device view of the state. The IO
> scheduler would be responsible for moving one or more bios from the
> software queues to the hardware queue, when they are ready to dispatch.
>
> [1] https://github.com/osandov/linux/commit/8ef3508628b6cf7c4712cd3d8084ee11ef5d2530

Yeah, but what would be software queues actually good for for a single
queue device with device-global IO scheduling? The IO scheduler doing
complex decisions will keep all the bios / requests in a single structure
anyway so there's no scalability to gain from per-cpu software queues...
So you can directly consume bios in your ->make_request handler, place it
in IO scheduler structures and then push requests out to the HW queue in
response to HW tags getting freed (i.e. IO completion). No need
for intermediate software queues. But maybe I miss something.

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR