Re: io-scheduler tuning for better read/write ratio
From: Jens Axboe
Date: Fri Jun 26 2009 - 06:44:27 EST
On Fri, Jun 26 2009, Wu Fengguang wrote:
> On Tue, Jun 23, 2009 at 03:42:46AM +0800, Jeff Moyer wrote:
> > Ralf Gross <rg@xxxxxxxxxxxxxxxxxxxxxxx> writes:
> >
> > > Jeff Moyer schrieb:
> > >> Jeff Moyer <jmoyer@xxxxxxxxxx> writes:
> > >>
> > >> > Ralf Gross <rg@xxxxxxxxxxxxxxxxxxxxxxx> writes:
> > >> >
> > >> >> Casey Dahlin schrieb:
> > >> >>> On 06/16/2009 02:40 PM, Ralf Gross wrote:
> > >> >>> > David Newall schrieb:
> > >> >>> >> Ralf Gross wrote:
> > >> >>> >>> write throughput is much higher than the read throughput (40 MB/s
> > >> >>> >>> read, 90 MB/s write).
> > >> >>> >
> > >> >>> > Hm, but I get higher read throughput (160-200 MB/s) if I don't write
> > >> >>> > to the device at the same time.
> > >> >>> >
> > >> >>> > Ralf
> > >> >>>
> > >> >>> How specifically are you testing? It could depend a lot on the
> > >> >>> particular access patterns you're using to test.
> > >> >>
> > >> >> I did the basic tests with tiobench. The real test is a test backup
> > >> >> (bacula) with 2 jobs that create 2 30 GB spool files on that device.
> > >> >> The jobs partially write to the device in parallel. Depending which
> > >> >> spool file reaches the 30 GB first, one starts reading from that file
> > >> >> and writing to tape, while to other is still spooling.
> > >> >
> > >> > We are missing a lot of details, here. I guess the first thing I'd try
> > >> > would be bumping up the max_readahead_kb parameter, since I'm guessing
> > >> > that your backup application isn't driving very deep queue depths. If
> > >> > that doesn't work, then please provide exact invocations of tiobench
> > >> > that reprduce the problem or some blktrace output for your real test.
> > >>
> > >> Any news, Ralf?
> > >
> > > sorry for the delay. atm there are large backups running and using the
> > > raid device for spooling. So I can't do any tests.
> > >
> > > Re. read ahead: I tested different settings from 8Kb to 65Kb, this
> > > didn't help.
> > >
> > > I'll do some more tests when the backups are done (3-4 more days).
> >
> > The default is 128KB, I believe, so it's strange that you would test
> > smaller values. ;) I would try something along the lines of 1 or 2 MB.
> >
> > I'm CCing Fengguang in case he has any suggestions.
>
> Jeff, thank you for the forwarding (and sorry for the long delay)!
>
> The read:write (or rather sync:async) ratio control is an IO scheduler
> feature. CFQ has parameters slice_sync and slice_async for that.
> What's more, CFQ will let async IO wait if there are any in flight
> sync IO. This is good, but not quite enough. Normally sync IOs come
> one by one, with some small idle time window in between. If we only
> start dispatching async IOs after the last sync IO has completed for
> eg. 1ms, then we may stop the async background write IOs when there
> are active sync foreground read IO stream.
>
> This simple patch aims to address the writes-push-aside-reads problem.
> Ralf, you can try applying this patch and run your workload with this
> (huge) CFQ parameter:
>
> echo 1000 > /sys/block/sda/queue/iosched/slice_sync
>
> The patch is based on 2.6.30, but can be trivially backported if you
> want to use some old kernel.
>
> It may impact overall (sync+async) IO throughput when there are one or
> more ongoing sync IO streams, so requires considerable benchmarks and
> adjustments.
>
> Thanks,
> Fengguang
> ---
>
> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> index a55a9bd..14011b7 100644
> --- a/block/cfq-iosched.c
> +++ b/block/cfq-iosched.c
> @@ -1064,7 +1064,6 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
> if (blk_queue_nonrot(cfqd->queue) && cfqd->hw_tag)
> return;
>
> - WARN_ON(!RB_EMPTY_ROOT(&cfqq->sort_list));
> WARN_ON(cfq_cfqq_slice_new(cfqq));
>
> /*
> @@ -2175,8 +2174,6 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
> * or if we want to idle in case it has no pending requests.
> */
> if (cfqd->active_queue == cfqq) {
> - const bool cfqq_empty = RB_EMPTY_ROOT(&cfqq->sort_list);
> -
> if (cfq_cfqq_slice_new(cfqq)) {
> cfq_set_prio_slice(cfqd, cfqq);
> cfq_clear_cfqq_slice_new(cfqq);
> @@ -2190,8 +2187,8 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
> */
> if (cfq_slice_used(cfqq) || cfq_class_idle(cfqq))
> cfq_slice_expired(cfqd, 1);
> - else if (cfqq_empty && !cfq_close_cooperator(cfqd, cfqq, 1) &&
> - sync && !rq_noidle(rq))
> + else if (sync && !rq_noidle(rq) &&
> + !cfq_close_cooperator(cfqd, cfqq, 1))
> cfq_arm_slice_timer(cfqd);
> }
What's the purpose of this patch? If you have requests pending you don't
want to arm the idle timer and wait, you want to dispatch those.
--
Jens Axboe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/