Re: Time sliced CFQ io scheduler

From: Nick Piggin
Date: Wed Dec 08 2004 - 02:39:26 EST


On Wed, 2004-12-08 at 08:20 +0100, Jens Axboe wrote:
> On Wed, Dec 08 2004, Nick Piggin wrote:
> > On Wed, 2004-12-08 at 07:58 +0100, Jens Axboe wrote:
> > > On Wed, Dec 08 2004, Nick Piggin wrote:
> > > > On Tue, 2004-12-07 at 18:25 -0800, Andrew Morton wrote:
> >
> > > > I think we could detect when a disk asks for more than, say, 4
> > > > concurrent requests, and in that case turn off read anticipation
> > > > and all the anti-starvation for TCQ by default (with the option
> > > > to force it back on).
> > >
> > > CFQ only allows a certain depth a the hardware level, you can control
> > > that. I don't think you should drop the AS behaviour in that case, you
> > > should look at when the last request comes in and what type it is.
> > >
> > > With time sliced cfq I'm seeing some silly SCSI disk behaviour as well,
> > > it gets harder to get good read bandwidth as the disk is trying pretty
> > > hard to starve me. Maybe killing write back caching would help, I'll
> > > have to try.
> > >
> >
> > I "fixed" this in AS. It gets (or got, last time we checked, many months
> > ago) pretty good read latency even with a big write and a very large
> > tag depth.
> >
> > What were the main things I had to do... hmm, I think the main one was
> > to not start on a new batch until all requests from a previous batch
> > are reported to have completed. So eg. you get all reads completing
> > before you start issuing any more writes. The write->read side of things
> > isn't so clear cut with your "smart" write caches on the IO systems, but
> > no doubt that helps a bit.
>
> I can see the read/write batching being helpful there, at least to
> prevent writes starving reads if you let the queue drain completely
> before starting a new batch.
>
> CFQ does something similar, just not batched together. But it does let
> the depth build up a little and drain out. In fact I think I'm missing
> a little fix there thinking about it, that could be why the read
> latencies hurt on write intensive loads (the dispatch queue is drained,
> the hardware queue is not fully).
>

OK, you should look into that, because I found it was quite effective.
Maybe you have a little bug or oversight somewhere if you read latencies
are really bad. Note that AS read latencies at 256 tags aren't so good
as at 2 tags... but I think they're an order of magnitude better than
with deadline on the hardware we were testing.

> > Of course, after you do all that your database performance has well and
> > truly gone down the shitter. It is also hampered by the more fundamental
> > issue that read anticipating can block up the pipe for IO that is cached
> > on the controller/disks and would get satisfied immediately.
>
> I think we need to end up with something that sets the machine profile
> for the interesting disks. Some things you can check for at runtime
> (like the writes being extremely fast is a good indicator of write
> caching), but it is just not possible to cover it all. Plus, you end up
> with 30-40% of the code being convoluted stuff added to detect it.
>

Ideally maybe we would have a userspace program that is run to detect
various disk parameters and ask the user / config file what sort of
workloads we want to do, and spits out a recommended IO scheduler and
/sys configuration to accompany it.

That at least could be made quite sophisticated than a kernel solution,
and could gather quite a lot of "static" disk properties.

Of course there will be also some things that need to be done in
kernel...


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/