Re: [PATCH] cfq-iosched: rework seeky detection

From: Corrado Zoccolo
Date: Wed Jan 13 2010 - 03:05:29 EST


On Wed, Jan 13, 2010 at 12:17 AM, Corrado Zoccolo <czoccolo@xxxxxxxxx> wrote:
> On Tue, Jan 12, 2010 at 11:36 PM, Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
>>> The fact is, can we reliably determine which of those two setups we
>>> have from cfq?
>>
>> I have no idea at this point of time but it looks like determining this
>> will help.
>>
>> May be something like keep a track of number of processes on "sync-noidle"
>> tree and average read times when sync-noidle tree is being served. Over a
>> period of time we need to monitor what's the number of processes
>> (threshold), after which average read time goes up. For sync-noidle we can
>> then drive "queue_depth=nr_thrshold" and once queue depth reaches that,
>> then idle on the process. So for single spindle, I guess tipping point
>> will be 2 processes and we can idle on sync-noidle process. For more
>> spindles, tipping point will be higher.
>>
>> These are just some random thoughts.
> It seems reasonable.
I think, though, that the implementation will be complex.
We should limit this to request sizes that are <= stripe size (larger
requests will hit more disks, and have a much lower optimal queue
depth), so we need to add a new service_tree (they will become:
SYNC_IDLE_LARGE, SYNC_IDLE_SMALL, SYNC_NOIDLE, ASYNC), and the
optimization will apply only to the SYNC_IDLE_SMALL tree.
Moreover, we can't just dispatch K queues and then idle on the last
one. We need to have a set of K active queues, and wait on any of
them. This makes this optimization very complex, and I think for
little gain. In fact, usually we don't have sequential streams of
small requests, unless we misuse mmap or direct I/O.
BTW, the mmap problem could be easily fixed adding madvise(WILL_NEED)
to the userspace program, when dealing with data.
I think we only have to worry about binaries, here.

> Something similar to what we do to reduce depth for async writes.
> Can you see if you get similar BW improvements also for parallel
> sequential direct I/Os with block size < stripe size?

Thanks,
Corrado
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/