Re: performance "regression" in cfq compared to anticipatory, deadline and noop

From: Jens Axboe
Date: Thu May 15 2008 - 03:01:57 EST


On Wed, May 14 2008, Matthew wrote:
> On Wed, May 14, 2008 at 10:52 PM, Daniel J Blueman
> <daniel.blueman@xxxxxxxxx> wrote:
> > On Wed, May 14, 2008 at 9:26 AM, Jens Axboe <jens.axboe@xxxxxxxxxx> wrote:
> >> On Wed, May 14 2008, Daniel J Blueman wrote:
> >>> > > > > > > > On Sun, 2008-05-11 at 14:14 +0100, Daniel J Blueman wrote:
> >>> > > > > > > > > I've been experiencing this for a while also; an almost 50% regression
> >>> > > > > > > > > is seen for single-process reads (ie sync) if slice_idle is 1ms or
> >>> > > > > > > > > more (eg default of 8) [1], which seems phenomenal.
> >>> > > > > > > > >
> >>> > > > > > > > > Jens, is this the expected price to pay for optimal busy-spindle
> >>> > > > > > > > > scheduling, a design issue, bug or am I missing something totally?
> >>> [snip]
> >>> > > They seem to start out the same, but then CFQ gets interrupted by a
> >>> > > timer unplug (which is also odd) and after that the request size drops.
> >>> > > On most devices you don't notice, but some are fairly picky about
> >>> > > request sizes. The end result is that CFQ has an average dispatch
> >>> > > request size of 142kb, where AS is more than double that at 306kb. I'll
> >>> > > need to analyze the data and look at the code a bit more to see WHY this
> >>> > > happens.
> >>> >
> >>> > Here's a test patch, I think we get into this situation due to CFQ being
> >>> > a bit too eager to start queuing again. Not tested, I'll need to spend
> >>> > some testing time on this. But I'd appreciate some feedback on whether
> >>> > this changes the situation! The final patch will be a little more
> >>> > involved.
> >>> >
> >>> > diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> >>> > index b399c62..ebd8ce2 100644
> >>> > --- a/block/cfq-iosched.c
> >>> > +++ b/block/cfq-iosched.c
> >>> > @@ -1775,18 +1775,8 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue *cfqq,
> >>> >
> >>> > cic->last_request_pos = rq->sector + rq->nr_sectors;
> >>> >
> >>> > - if (cfqq == cfqd->active_queue) {
> >>> > - /*
> >>> > - * if we are waiting for a request for this queue, let it rip
> >>> > - * immediately and flag that we must not expire this queue
> >>> > - * just now
> >>> > - */
> >>> > - if (cfq_cfqq_wait_request(cfqq)) {
> >>> > - cfq_mark_cfqq_must_dispatch(cfqq);
> >>> > - del_timer(&cfqd->idle_slice_timer);
> >>> > - blk_start_queueing(cfqd->queue);
> >>> > - }
> >>> > - } else if (cfq_should_preempt(cfqd, cfqq, rq)) {
> >>> > + if ((cfqq != cfqd->active_queue) &&
> >>> > + cfq_should_preempt(cfqd, cfqq, rq)) {
> >>> > /*
> >>> > * not the active queue - expire current slice if it is
> >>> > * idle and has expired it's mean thinktime or this new queue
> >>>
> >>> I find this does address the issue (both with 64KB stride dd and
> >>> hdparm -t; presumably the requests getting merged). Tested on
> >>> 2.6.26-rc2 on Ubuntu HH 804 x86-64, with slice_idle defaulting to 8
> >>> and AHCI on ICH9; disk is ST3320613AS.
> >>>
> >>> Blktrace profiles from 'dd if=/dev/sda of=/dev/null bs=64k count=1000' are at:
> >>>
> >>> http://quora.org/blktrace-profiles.tar.bz2
> >>
> >> Goodie! I think the below patch is better - we do want to schedule the
> >> queue immediately, but we do not want to interrupt the queuer. So just
> >> kick the workqueue handling of the queue instead of entering the
> >> dispatcher directly. Can you test this one as well? Thanks!
> >>
> >> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> >> index f4e1006..e8c1941 100644
> >> --- a/block/cfq-iosched.c
> >> +++ b/block/cfq-iosched.c
> >> @@ -1107,7 +1107,6 @@ static int cfq_dispatch_requests(struct request_queue *q, int force)
> >>
> >> cfq_clear_cfqq_must_dispatch(cfqq);
> >> cfq_clear_cfqq_wait_request(cfqq);
> >> - del_timer(&cfqd->idle_slice_timer);
> >>
> >> dispatched += __cfq_dispatch_requests(cfqd, cfqq, max_dispatch);
> >> }
> >> @@ -1769,15 +1768,9 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue *cfqq,
> >> cic->last_request_pos = rq->sector + rq->nr_sectors;
> >>
> >> if (cfqq == cfqd->active_queue) {
> >> - /*
> >> - * if we are waiting for a request for this queue, let it rip
> >> - * immediately and flag that we must not expire this queue
> >> - * just now
> >> - */
> >> if (cfq_cfqq_wait_request(cfqq)) {
> >> - cfq_mark_cfqq_must_dispatch(cfqq);
> >> del_timer(&cfqd->idle_slice_timer);
> >> - blk_start_queueing(cfqd->queue);
> >> + kblockd_schedule_work(&cfqd->unplug_work);
> >> }
> >> } else if (cfq_should_preempt(cfqd, cfqq, rq)) {
> >> /*
> >> @@ -1787,7 +1780,7 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue *cfqq,
> >> */
> >> cfq_preempt_queue(cfqd, cfqq);
> >> cfq_mark_cfqq_must_dispatch(cfqq);
> >> - blk_start_queueing(cfqd->queue);
> >> + kblockd_schedule_work(&cfqd->unplug_work);
> >> }
> >> }
> >
> > Applied on top of 2.6.26-rc2, I get platter-speed (118MB/s) with 'dd
> > if=/dev/sda of=/dev/null bs=64k' and 'hdparm -t', so looks good.
> > Identical testing without the patch (ie pure mainline) consistently
> > yields 65MB/s.
> >
> > Blktrace profile at:
> >
> > http://quora.org/blktrace-profiles-2.tar.bz2
> >
> > I'll check for performance regressions with postmark on XFS; anything
> > else worth running while I've got this in hand?
> >
> > Daniel
> > --
> > Daniel J Blueman
> >
>
> so it seems something specific to >=2.6.26-rc2 + the patch fixed it for you ?
>
> were there any notable changes from 2.6.25 -> 2.6.26 in cfq or the VFS
> in general ?
>
> I'm curious if this also works with 2.6.25, could you please test that too ?
>
> I'll give .26-rc2 a test-ride later

I don't think it's 2.6.25 vs 2.6.26-rc2, I can still reproduce some
request size offsets with the patch. So still fumbling around with this,
I'll be sending out another test patch when I'm confident it's solved
the size issue.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/