Re: Performance regression in IO scheduler still there

From: Corrado Zoccolo
Date: Tue Nov 10 2009 - 12:38:05 EST


On Tue, Nov 10, 2009 at 5:47 PM, Jeff Moyer <jmoyer@xxxxxxxxxx> wrote:
> Corrado Zoccolo <czoccolo@xxxxxxxxx> writes:
>
>> Jeff, Jens,
>> do you think we should try to do more auto-tuning of cfq parameters?
>> Looking at those numbers for SANs, I think we are being suboptimal in
>> some cases.
>> E.g. sequential read throughput is lower than random read.
>
> I investigated this further, and this was due to a problem in the
> benchmark. ÂIt was being run with only 500 samples for random I/O and
> 65536 samples for sequential. ÂAfter fixing this, we see random I/O is
> slower than sequential, as expected.
Ok.
>> I also think that current slice_idle and slice_sync values are good
>> for devices with 8ms seek time, but they are too high for non-NCQ
>> flash devices, where "seek" penalty is under 1ms, and we still prefer
>> idling.
>
> Do you have numbers to back that up? ÂIf not, throw a fio job file over
> the fence and I'll test it on one such device.
>
It is based on reasoning.
Currently idling is based on the assumption that we can wait up to
10ms, to get a better request than jumping far away, since the jump
will likely cost more than that. If the jump costs around 1ms, like on
flash cards, then waiting 10ms is surely wasted time.
On the other hand, on flash cards a random write could cost 50ms or
more, so we will need to differentiate the last idle before switching
to async writes from the inter-read idles. This should be possible
with the new workload based infrastructure, but we need to measure
those characteristic times in order to use them in the heuristics.

>> If we agree on this, should the measurement part (I'm thinking to
>> measure things like seek time, throughput, etc...) be added to the
>> common elevator code, or done inside cfq?
>
> Well, if it's something that is of interest to others, than pushing it
> up a layer makes sense. ÂIf only CFQ is going to use it, keep it there.
If the direction is to have only one intelligent I/O scheduler, as the
removal of anticipatory indicates, then it is the latter. I don't
think noop or deadline will ever make any use of them.
But it could still be useful for reporting performance as seen by the
kernel, after the page cache.

Thanks
Corrado
>
> Cheers,
> Jeff
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/