Re: [PATCH V3 00/11] block-throttle: add .high limit

From: Austin S. Hemmelgarn
Date: Thu Oct 06 2016 - 11:11:16 EST

On 2016-10-06 11:05, Paolo Valente wrote:

Il giorno 06 ott 2016, alle ore 15:52, Austin S. Hemmelgarn <ahferroin7@xxxxxxxxx> ha scritto:

On 2016-10-06 08:50, Paolo Valente wrote:

Il giorno 06 ott 2016, alle ore 13:57, Austin S. Hemmelgarn <ahferroin7@xxxxxxxxx> ha scritto:

On 2016-10-06 07:03, Mark Brown wrote:
On Thu, Oct 06, 2016 at 10:04:41AM +0200, Linus Walleij wrote:
On Tue, Oct 4, 2016 at 9:14 PM, Tejun Heo <tj@xxxxxxxxxx> wrote:

I get that bfq can be a good compromise on most desktop workloads and
behave reasonably well for some server workloads with the slice
expiration mechanism but it really isn't an IO resource partitioning

Not just desktops, also Android phones.

So why not have BFQ as a separate scheduling policy upstream,
alongside CFQ, deadline and noop?


We're already doing the per-usecase Kconfig thing for preemption.
But maybe somebody already hates that and want to get rid of it,
I don't know.

Hannes also suggested going back to making BFQ a separate scheduler
rather than replacing CFQ earlier, pointing out that it mitigates
against the risks of changing CFQ substantially at this point (which
seems to be the biggest issue here).

ISTR that the original argument for this approach essentially amounted to: 'If it's so much better, why do we need both?'.

Such an argument is valid only if the new design is better in all respects (which there isn't sufficient information to decide in this case), or the negative aspects are worth the improvements (which is too workload specific to decide for something like this).

All correct, apart from the workload-specific issue, which is not very clear to me. Over the last five years I have not found a single workload for which CFQ is better than BFQ, and none has been suggested.
My point is that whether or not BFQ is better depends on the workload. You can't test for every workload, so you can't say definitively that BFQ is better for every workload.


At a minimum, there are workloads where the deadline and noop schedulers are better, but they're very domain specific workloads.


Based on the numbers from Shaohua, it looks like CFQ has better throughput than BFQ, and that will affect some workloads (for most, the improved fairness is worth the reduced throughput, but there probably are some cases where it isn't).

Well, no fairness as deadline and noop, but with much less throughput
than deadline and noop, doesn't sound much like the best scheduler for
those workloads. With BFQ you have service guarantees, with noop or
deadline you have maximum throughput.
And with CFQ you have something in between, which is half of why I think CFQ is still worth keeping (the other half being the people who inevitably want to stay on CFQ). And TBH, deadline and noop only give good throughput with specific workloads (and in the case of noop, it's usually only useful on tiny systems where the overhead of scheduling is greater than the time saved by doing so (like some very low power embedded systems), or when you have scheduling done elsewher in the storage stack (like in a VM)).

Anyway, leaving aside this fact, IMO the real problem here is that we are in a catch-22: "we want BFQ to replace CFQ, but, since CFQ is legacy code, then you cannot change, and thus replace, CFQ"
I agree that that's part of the issue, but I also don't entirely agree with the reasoning on it. Until blk-mq has proper I/O scheduling, people will continue to use CFQ, and based on the way things are going, it will be multiple months before that happens, whereas BFQ exists and is working now.