Re: Time sliced CFQ io scheduler
From: Jens Axboe
Date: Wed Dec 08 2004 - 02:24:58 EST
On Wed, Dec 08 2004, Nick Piggin wrote:
> On Wed, 2004-12-08 at 07:58 +0100, Jens Axboe wrote:
> > On Wed, Dec 08 2004, Nick Piggin wrote:
> > > On Tue, 2004-12-07 at 18:25 -0800, Andrew Morton wrote:
> > > I think we could detect when a disk asks for more than, say, 4
> > > concurrent requests, and in that case turn off read anticipation
> > > and all the anti-starvation for TCQ by default (with the option
> > > to force it back on).
> > CFQ only allows a certain depth a the hardware level, you can control
> > that. I don't think you should drop the AS behaviour in that case, you
> > should look at when the last request comes in and what type it is.
> > With time sliced cfq I'm seeing some silly SCSI disk behaviour as well,
> > it gets harder to get good read bandwidth as the disk is trying pretty
> > hard to starve me. Maybe killing write back caching would help, I'll
> > have to try.
> I "fixed" this in AS. It gets (or got, last time we checked, many months
> ago) pretty good read latency even with a big write and a very large
> tag depth.
> What were the main things I had to do... hmm, I think the main one was
> to not start on a new batch until all requests from a previous batch
> are reported to have completed. So eg. you get all reads completing
> before you start issuing any more writes. The write->read side of things
> isn't so clear cut with your "smart" write caches on the IO systems, but
> no doubt that helps a bit.
I can see the read/write batching being helpful there, at least to
prevent writes starving reads if you let the queue drain completely
before starting a new batch.
CFQ does something similar, just not batched together. But it does let
the depth build up a little and drain out. In fact I think I'm missing
a little fix there thinking about it, that could be why the read
latencies hurt on write intensive loads (the dispatch queue is drained,
the hardware queue is not fully).
> Of course, after you do all that your database performance has well and
> truly gone down the shitter. It is also hampered by the more fundamental
> issue that read anticipating can block up the pipe for IO that is cached
> on the controller/disks and would get satisfied immediately.
I think we need to end up with something that sets the machine profile
for the interesting disks. Some things you can check for at runtime
(like the writes being extremely fast is a good indicator of write
caching), but it is just not possible to cover it all. Plus, you end up
with 30-40% of the code being convoluted stuff added to detect it.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/