Re: CFQ read performance regression

From: Corrado Zoccolo
Date: Tue Apr 27 2010 - 13:25:23 EST


On Mon, Apr 26, 2010 at 9:14 PM, Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
> On Sat, Apr 24, 2010 at 10:36:48PM +0200, Corrado Zoccolo wrote:
>
> [..]
>> >> Anyway, if that's the case, then we probably need to allow IO from
>> >> multiple sequential readers and keep a watch on throughput. If throughput
>> >> drops then reduce the number of parallel sequential readers. Not sure how
>> >> much of code that is but with multiple cfqq going in parallel, ioprio
>> >> logic will more or less stop working in CFQ (on multi-spindle hardware).
>> Hi Vivek,
>> I tried to implement exactly what you are proposing, see the attached patches.
>> I leverage the queue merging features to let multiple cfqqs share the
>> disk in the same timeslice.
>> I changed the queue split code to trigger on throughput drop instead
>> of on seeky pattern, so diverging queues can remain merged if they
>> have good throughput. Moreover, I measure the max bandwidth reached by
>> single queues and merged queues (you can see the values in the
>> bandwidth sysfs file).
>> If merged queues can outperform non-merged ones, the queue merging
>> code will try to opportunistically merge together queues that cannot
>> submit enough requests to fill half of the NCQ slots. I'd like to know
>> if you can see any improvements out of this on your hardware. There
>> are some magic numbers in the code, you may want to try tuning them.
>> Note that, since the opportunistic queue merging will start happening
>> only after merged queues have shown to reach higher bandwidth than
>> non-merged queues, you should use the disk for a while before trying
>> the test (and you can check sysfs), or the merging will not happen.
>
> Hi Corrado,
>
> I ran these patches and I did not see any improvement. I think the reason
> being that no cooperative queue merging took place and we did not have
> any data for throughput with coop flag on.
>
> #cat /sys/block/dm-3/queue/iosched/bandwidth
> 230 Â Â 753 Â Â 0
>
> I think we need to implement something similiar to hw_tag detection logic
> where we allow dispatches from multiple sync-idle queues at a time and try
> to observe the BW. After certain window once we have observed the window,
> then set the system behavior accordingly.
Hi Vivek,
thanks for testing. Can you try changing the condition to enable the
queue merging to also consider the case in which max_bw[1] == 0 &&
max_bw[0] > 100MB/s (notice that max_bw is measured in
sectors/jiffie).
This should rule out low end disks, and enable merging where it can be
beneficial.
If the results are good, but we find this enabling condition
unreliable, then we can think of a better way, but I'm curious to see
if the results are promising before thinking to the details.

Thanks,
Corrado

>
> Kernel=2.6.34-rc5-corrado-multicfq
> DIR= /mnt/iostmnt/fio     ÂDEV= /dev/mapper/mpathe
> Workload=bsr    iosched=cfq   ÂFilesz=2G  Âbs=4K
> ==========================================================================
> job    Set NR ÂReadBW(KB/s)  MaxClat(us)  ÂWriteBW(KB/s) ÂMaxClat(us)
> --- Â Â Â --- -- Â------------ Â ----------- Â Â------------- Â-----------
> bsr    1  1  126590     61448     Â0       Â0
> bsr    1  2  127849     242843     0       Â0
> bsr    1  4  131886     508021     0       Â0
> bsr    1  8  131890     398241     0       Â0
> bsr    1  16 Â129167     454244     0       Â0
>
> Thanks
> Vivek
>



--
__________________________________________________________________________

dott. Corrado Zoccolo mailto:czoccolo@xxxxxxxxx
PhD - Department of Computer Science - University of Pisa, Italy
--------------------------------------------------------------------------
The self-confidence of a warrior is not the self-confidence of the average
man. The average man seeks certainty in the eyes of the onlooker and calls
that self-confidence. The warrior seeks impeccability in his own eyes and
calls that humbleness.
Tales of Power - C. Castaneda
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/