Re: fio mmap randread 64k more than 40% regression with 2.6.33-rc1
From: Corrado Zoccolo
Date: Tue Jan 19 2010 - 16:58:36 EST
On Tue, Jan 19, 2010 at 10:40 PM, Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
> On Tue, Jan 19, 2010 at 09:10:33PM +0100, Corrado Zoccolo wrote:
>> On Mon, Jan 18, 2010 at 4:06 AM, Zhang, Yanmin
>> <yanmin_zhang@xxxxxxxxxxxxxxx> wrote:
>> > On Sat, 2010-01-16 at 17:27 +0100, Corrado Zoccolo wrote:
>> >> Hi Yanmin
>> >> On Mon, Jan 4, 2010 at 7:28 PM, Corrado Zoccolo <czoccolo@xxxxxxxxx> wrote:
>> >> > Hi Yanmin,
>> >> >> When low_latency=1, we get the biggest number with kernel 2.6.32.
>> >> >> Comparing with low_latency=0's result, the prior one is about 4% better.
>> >> > Ok, so 2.6.33 + corrado (with low_latency =0) is comparable with
>> >> > fastest 2.6.32, so we can consider the first part of the problem
>> >> > solved.
>> >> >
>> >> I think we can return now to your full script with queue merging.
>> >> I'm wondering if (in arm_slice_timer):
>> >> - Â Â Â if (cfqq->dispatched)
>> >> + Â Â Âif (cfqq->dispatched || (cfqq->new_cfqq && rq_in_driver(cfqd)))
>> >> Â Â Â Â Â Â Â Âreturn;
>> >> gives the same improvement you were experiencing just reverting to rq_in_driver.
>> > I did a quick testing against 2.6.33-rc1. With the new method, fio mmap randread 46k
>> > has about 20% improvement. With just checking rq_in_driver(cfqd), it has
>> > about 33% improvement.
>> >
>> Jeff, do you have an idea why in arm_slice_timer, checking
>> rq_in_driver instead of cfqq->dispatched gives so much improvement in
>> presence of queue merging, while it doesn't have noticeable effect
>> when there are no merges?
>
> Performance improvement because of replacing cfqq->dispatched with
> rq_in_driver() is really strange. This will mean we will do even lesser
> idling on the cfqq. That means faster cfqq switching and that should mean more
> seeks (for this test case) and reduce throughput. This is just opposite to your approach of treating a random read mmap queue as sync where we will idle on
> the queue.
The tests (previous mails in this thread) show that, if no queue
merging is happening, handling the queue as sync_idle, and setting
low_latency = 0 to have bigger slices completely recovers the
regression.
If, though, we have queue merges, current arm_slice_timer shows
regression w.r.t. the rq_in_driver version (2.6.32).
I think a possible explanation is that we are idling instead of
switching to an other queue that would be merged with this one. In
fact, my half-backed try to have the rq_in_driver check conditional on
queue merging fixed part of the regression (not all, because queue
merges are not symmetrical, and I could be seeing the queue that is
'new_cfqq' for an other).
Thanks,
Corrado
>
> Thanks
> Vivek
>
>>
>> Thanks,
>> Corrado
>>
>> >
>> >>
>> >> We saw that cfqq->dispatched worked fine when there was no queue
>> >> merging happening, so it must be something concerning merging,
>> >> probably dispatched is not accurate when we set up for a merging, but
>> >> the merging was not yet done.
>> >
>> >
>> >
>
--
__________________________________________________________________________
dott. Corrado Zoccolo mailto:czoccolo@xxxxxxxxx
PhD - Department of Computer Science - University of Pisa, Italy
--------------------------------------------------------------------------
The self-confidence of a warrior is not the self-confidence of the average
man. The average man seeks certainty in the eyes of the onlooker and calls
that self-confidence. The warrior seeks impeccability in his own eyes and
calls that humbleness.
Tales of Power - C. Castaneda
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/