Re: Bandwidth Allocations under CFQ I/O Scheduler

From: Ric Wheeler
Date: Tue Oct 17 2006 - 10:38:19 EST


Jens Axboe wrote:
On Tue, Oct 17 2006, Arjan van de Ven wrote:

On Mon, 2006-10-16 at 16:46 -0400, Phetteplace, Thad (GE Healthcare,
consultant) wrote:

The I/O priority levels available under the CFQ scheduler are
nice (no pun in intended), but I remember some talk back when
they first went in that future versions might include bandwidth
allocations in addition to the 'niceness' style. Is anyone out
there working on that? If not, I'm willing to hack up a proof
of concept... I just wan't to make sure I'm not reinventing
the wheel.


Hi,

it's a nice idea in theory. However... since IO bandwidth for seeks is
about 1% to 3% of that of sequential IO (on disks at least), which
bandwidth do you want to allocate? "worst case" you need to use the
all-seeks bandwidth, but that's so far away from "best case" that it may
well not be relevant in practice. Yet there are real world cases where
for a period of time you approach worst case behavior ;(


Bandwidth reservation would have to be confined to special cases, you
obviously cannot do it "in general" for the reasons Arjan lists above.
So you absolutely have to limit any meta data io that would cause seeks,
and the file in question would have to be laid out in a closely
sequential fashion. As long as the access pattern generated by the app
asking for reservation is largely sequential, the kernel can do whatever
it needs to help you maintain the required bandwidth.

On a per-file basis the bandwidth reservation should be doable, to the
extent that generic hardware allows.

I agree - bandwidth allocation is really tricky to do in a useful way.

On one hand, you could "time slice" the disk with some large quanta as we would do with a CPU to get some reasonably useful allocation for competing, streaming workloads.

On the other hand, this kind of thing would kill latency if/when you hit any synchronous writes (or cold reads).

One other possible use for allocation is throttling a background workload (say, an interative checker for a file system or some such thing) where the workload can run effectively forever, but should be contained to not interfere with foreground workloads. A similar time slice might be used to throttle this load done unless there is no competing work to be done.




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/