Fwd: [PATCH V3 00/11] block-throttle: add .high limit

From: Kyle Sanderson
Date: Sat Oct 08 2016 - 21:16:13 EST


Re-sending as plain-text as the Gmail Android App is still
historically broken...

---------- Forwarded message ----------
From: Kyle Sanderson <kyle.leet@xxxxxxxxx>
Date: Wed, Oct 5, 2016 at 7:09 AM
Subject: Re: [PATCH V3 00/11] block-throttle: add .high limit
To: Tejun Heo <tj@xxxxxxxxxx>
Cc: jmoyer@xxxxxxxxxx, Paolo Valente <paolo.valente@xxxxxxxxxx>,
linux-kernel@xxxxxxxxxxxxxxx, Mark Brown <broonie@xxxxxxxxxx>,
linux-block@xxxxxxxxxxxxxxx, Shaohua Li <shli@xxxxxx>, Jens Axboe
<axboe@xxxxxx>, Linus Walleij <linus.walleij@xxxxxxxxxx>, Vivek Goyal
<vgoyal@xxxxxxxxxx>, Kernel-team@xxxxxx, Ulf Hansson
<ulf.hansson@xxxxxxxxxx>


Obviously not to compound against this, however it has been proven for
years that CFQ will lock significantly under contention, and other
schedulers, such as BFQ attempt to provide fairness which is
absolutely the desired outcome from using a machine. The networking
space is a little wrecked in the sense there's a plethora of qdiscs
that don't necessarily need to exist; but are legacy. This limitation
does not exist in this realm as there are no specific tunables.

There is no reason that in 2016 a user-space application can steal all
of the I/O from a disk, completely locking the machine when BFQ has
essentially solved this years ago. I've been a moderately happy user
of BFQ for quite sometime now. There aren't tens, or hundreds of us,
but thousands through the custom kernels that are spun, and the
distros that helped support BFQ.

How is this even a discussion when hard numbers, and trying any
reproduction case easily reproduce the issues that CFQ causes. Reading
this thread, and many others only grows not only my disappointment,
but whenever someone launches kterm or scrot and their machine
freezes, leaves a selective few individuals completely responsible for
this. Help those users, help yourself, help Linux.


On 4 Oct 2016 1:29 pm, "Tejun Heo" <tj@xxxxxxxxxx> wrote:
>
> Hello, Paolo.
>
> On Tue, Oct 04, 2016 at 09:29:48PM +0200, Paolo Valente wrote:
> > > Hmm... I think we already discussed this but here's a really simple
> > > case. There are three unknown workloads A, B and C and we want to
> > > give A certain best-effort guarantees (let's say around 80% of the
> > > underlying device) whether A is sharing the device with B or C.
> >
> > That's the same example that you proposed me in our previous
> > discussion. For this example I showed you, with many boring numbers,
> > that with BFQ you get the most accurate distribution of the resource.
>
> Yes, it is about the same example and what I understood was that
> "accurate distribution of the resources" holds as long as the
> randomness is incidental (ie. due to layout on the filesystem and so
> on) with the slice expiration mechanism offsetting the actually random
> workloads.
>
> > If you have enough stamina, I can repeat them again. To save your
>
> I'll go back to the thread and re-read them.
>
> > patience, here is a very brief summary. In a concrete use case, the
> > unknown workloads turn into something like this: there will be a first
> > time interval during which A happens to be, say, sequential, B happens
> > to be, say, random and C happens to be, say, quasi-sequential. Then
> > there will be a next time interval during which their characteristics
> > change, and so on. It is easy (but boring, I acknowledge it) to show
> > that, for each of these time intervals BFQ provides the best possible
> > service in terms of fairness, bandwidth distribution, stability and so
> > on. Why? Because of the elastic bandwidth-time scheduling of BFQ
> > that we already discussed, and because BFQ is naturally accurate in
> > redistributing aggregate throughput proportionally, when needed.
>
> Yeah, that's what I remember and for workload above certain level of
> randomness its time consumption is mapped to bw, right?
>
> > > I get that bfq can be a good compromise on most desktop workloads and
> > > behave reasonably well for some server workloads with the slice
> > > expiration mechanism but it really isn't an IO resource partitioning
> > > mechanism.
> >
> > Right. My argument is that BFQ enables you to give to each client the
> > bandwidth and low-latency guarantees you want. And this IMO is way
> > better than partitioning a resource and then getting unavoidable
> > unfairness and high latency.
>
> But that statement only holds while bw is the main thing to guarantee,
> no? The level of isolation that we're looking for here is fairly
> strict adherence to sub/few-milliseconds in terms of high percentile
> scheduling latency while within the configured bw/iops limits, not
> "overall this device is being used pretty well".
>
> Thanks.
>
> --
> tejun